HA testing: node up brings down all Trafodion nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
Fix Committed
|
Critical
|
Gonzalo Correa |
Bug Description
After rebooting a node, node up brought down all trafodion nodes. This might be related to bug 1411520. Details are as follows:
=======
sea-dev7 cluster: 5 node cluster (without sea-dev8)
idle system
rebooted sea-dev9, nid 1
Trafodion stayed up and was usable/stable
[$Z000LKJ] %node info
[$Z000LKJ] Logical Nodes = 5
[$Z000LKJ] Physical Nodes = 5
[$Z000LKJ] Spare Nodes = 0
[$Z000LKJ] Available Spares = 0
[$Z000LKJ] NID Type State Processors #Procs
[$Z000LKJ] PNID State #Cores MemFree SwapFree CacheFree Name
[$Z000LKJ] --- ----------- -------- ---------- -------- -------- --------- --------
[$Z000LKJ] 000 Any Up 2 9
[$Z000LKJ] 000 Up 8 6661488 4095992 13259788 sea-dev7
[$Z000LKJ] 001 Any Down
[$Z000LKJ] 001 Down sea-dev9
[$Z000LKJ] 002 Any Up 2 9
[$Z000LKJ] 002 Up 8 831660 4065120 13607004 sea-dev10
[$Z000LKJ] 003 Any Up 2 9
[$Z000LKJ] 003 Up 8 431156 4066044 12088964 sea-dev11
[$Z000LKJ] 004 Any Up 2 8
[$Z000LKJ] 004 Up 8 154276 4065288 11674060 sea-dev12
<waited about 30 minutes to make sure node sea-dev9 had completed rebooting>
sqshell up sea-dev9
[$Z000LKJ] %up sea-dev9
[$Z000LKJ] 01/16/2015-11:40:28 - Node sea-dev9 is merging to existing cluster.
[$Z000LKJ] 01/16/2015-11:40:30 - Node sea-dev9 join phase starting.
all Trafodion nodes came down at this point
Changed in trafodion: | |
status: | New → In Progress |
assignee: | nobody → Gonzalo Correa (gonzalo.correa) |
Changed in trafodion: | |
status: | In Progress → Fix Committed |