Hi,
All three nodes of our Percona Cluster received exceptions in the last few hours which caused all nodes on the cluster to shut down.
dbw0 was running percona-xtradb-cluster-server-5.5 5.5.33-23.7.6-495.raring
dbc0 was running percona-xtradb-cluster-server-5.5 5.5.33-23.7.6-496.raring
dbe0 was running percona-xtradb-cluster-server-5.5 5.5.31-23.7.5-438.raring
We were planning on upgrading all nodes to 496 build of 5.5.33 but this happened before we could schedule a time to do that.
dbw0 received the exception in the summary.
dbc0/dbe0 received wsrep exceptions below.
dbw0:
terminate called after throwing an instance of 'std::out_of_range'
what(): vector::_M_range_check
04:14:30 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/
key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=140
max_threads=153
thread_count=136
connection_count=136
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 343054 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7dc9ae]
/usr/sbin/mysqld(handle_fatal_signal+0x491)[0x6beed1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfbd0)[0x7f8f9fcc8bd0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f8f9f2f0037]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f8f9f2f3698]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x11d)[0x7f8f9d585e8d]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ef76)[0x7f8f9d583f76]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5efa3)[0x7f8f9d583fa3]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5f1de)[0x7f8f9d5841de]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZSt20__throw_out_of_rangePKc+0x5d)[0x7f8f9d5d69ad]
/usr/lib/libgalera_smm.so(+0xd90f6)[0x7f8f9db090f6]
/usr/lib/libgalera_smm.so(_ZN5gcomm3evs5Proto19update_im_safe_seqsERKNS0_15MessageNodeListE+0x121)[0x7f8f9db09b41]
/usr/lib/libgalera_smm.so(_ZN5gcomm3evs5Proto11handle_joinERKNS0_11JoinMessageESt17_Rb_tree_iteratorISt4pairIKNS_4UUIDENS0_4NodeEEE+0xb60)[0x7f8f9db199a0]
/usr/lib/libgalera_smm.so(_ZN5gcomm3evs5Proto10handle_msgERKNS0_7MessageERKNS_8DatagramE+0x387)[0x7f8f9db22407]
/usr/lib/libgalera_smm.so(_ZN5gcomm3evs5Proto9handle_upEPKvRKNS_8DatagramERKNS_11ProtoUpMetaE+0x27b)[0x7f8f9db22dcb]
/usr/lib/libgalera_smm.so(_ZN5gcomm8Protolay7send_upERKNS_8DatagramERKNS_11ProtoUpMetaE+0x36)[0x7f8f9db24746]
/usr/lib/libgalera_smm.so(_ZN5gcomm6GMCast9handle_upEPKvRKNS_8DatagramERKNS_11ProtoUpMetaE+0x22a)[0x7f8f9db37e4a]
/usr/lib/libgalera_smm.so(_ZN5gcomm10Protostack8dispatchEPKvRKNS_8DatagramERKNS_11ProtoUpMetaE+0x58)[0x7f8f9db5ddd8]
/usr/lib/libgalera_smm.so(_ZN5gcomm12AsioProtonet8dispatchERKPKvRKNS_8DatagramERKNS_11ProtoUpMetaE+0x4b)[0x7f8f9db85e8b]
/usr/lib/libgalera_smm.so(_ZN5gcomm13AsioTcpSocket12read_handlerERKN4asio10error_codeEm+0x7a6)[0x7f8f9db688e6]
/usr/lib/libgalera_smm.so(_ZN4asio6detail7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceIS4_EEEEN5boost5arrayINS_14mutable_bufferELm1EEENS8_3_bi6bind_tImNS8_4_mfi3mf2ImN5gcomm13AsioTcpSocketERKNS_10error_codeEmEENSC_5list3INSC_5valueINS8_10shared_ptrISH_EEEEPFNS8_3argILi1EEEvEPFNSR_ILi2EEEvEEEEENSD_IvNSF_IvSH_SK_mEESY_EEEclESK_mi+0x94)[0x7f8f9db771c4]
/usr/lib/libgalera_smm.so(_ZN4asio6detail23reactive_socket_recv_opINS0_17consuming_buffersINS_14mutable_bufferEN5boost5arrayIS3_Lm1EEEEENS0_7read_opINS_19basic_stream_socketINS_2ip3tcpENS_21stream_socket_serviceISB_EEEES6_NS4_3_bi6bind_tImNS4_4_mfi3mf2ImN5gcomm13AsioTcpSocketERKNS_10error_codeEmEENSF_5list3INSF_5valueINS4_10shared_ptrISK_EEEEPFNS4_3argILi1EEEvEPFNSU_ILi2EEEvEEEEENSG_IvNSI_IvSK_SN_mEES11_EEEEE11do_completeEPNS0_15task_io_serviceEPNS0_25task_io_service_operationESL_m+0xdd)[0x7f8f9db7750d]
/usr/lib/libgalera_smm.so(_ZN4asio6detail15task_io_service3runERNS_10error_codeE+0x407)[0x7f8f9db89517]
/usr/lib/libgalera_smm.so(_ZN5gcomm12AsioProtonet10event_loopERKN2gu8datetime6PeriodE+0x1b0)[0x7f8f9db872f0]
/usr/lib/libgalera_smm.so(_ZN9GCommConn3runEv+0x5e)[0x7f8f9db9fe5e]
/usr/lib/libgalera_smm.so(_ZN9GCommConn6run_fnEPv+0x9)[0x7f8f9dba3139]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7f8f9fcc0f8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f8f9f3b2e1d]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
130929 04:14:30 mysqld_safe Number of processes running now: 0
130929 04:14:30 mysqld_safe WSREP: not restarting wsrep node automatically
130929 04:14:30 mysqld_safe mysqld from pid file /var/lib/mysql/dbw0.connected.cc.pid ended
dbc0:
130929 4:14:30 [ERROR] WSREP: exception caused by message: evs::msg{version=0,type=4,user_type=255,order=1,seq=3,seq_range=-1,aru_seq=3,flags=4,source=447a086c-2456-11e3-b1d8-bfcbd6dd35cd,source_view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=10620186,node_list=( 447a086c-2456-11e3-b1d8-bfcbd6dd35cd,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=3,im_range=[4,3],}
e10fe36e-2559-11e3-8f39-661947538c85,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,00000000-0000-0000-0000-000000000000,0),safe_seq=-1,im_range=[-1,-1],}
)
}
state after handling message: evs::proto(evs::proto(e10fe36e-2559-11e3-8f39-661947538c85, GATHER, view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94)), GATHER) {
current_view=view(view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94) memb {
239e4407-0ce7-11e3-9bee-3be82ca03086,
447a086c-2456-11e3-b1d8-bfcbd6dd35cd,
e10fe36e-2559-11e3-8f39-661947538c85,
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=-1,safe_seq=-1,node_index=node: {idx=0,range=[6,5],safe_seq=-1} node: {idx=1,range=[0,3],safe_seq=3} node: {idx=2,range=[6,5],safe_seq=-1} },
fifo_seq=7949526,
last_sent=5,
known={
239e4407-0ce7-11e3-9bee-3be82ca03086,evs::node{operational=1,suspected=0,installed=0,fifo_seq=66278528,join_message=
evs::msg{version=0,type=4,user_type=255,order=1,seq=-1,seq_range=-1,aru_seq=-1,flags=4,source=239e4407-0ce7-11e3-9bee-3be82ca03086,source_view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=66278527,node_list=( 239e4407-0ce7-11e3-9bee-3be82ca03086,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[6,5],}
447a086c-2456-11e3-b1d8-bfcbd6dd35cd,node: {operational=1,suspected=1,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[0,-1],}
e10fe36e-2559-11e3-8f39-661947538c85,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[6,5],}
)
},
}
447a086c-2456-11e3-b1d8-bfcbd6dd35cd,evs::node{operational=1,suspected=0,installed=0,fifo_seq=10620186,join_message=
evs::msg{version=0,type=4,user_type=255,order=1,seq=3,seq_range=-1,aru_seq=3,flags=4,source=447a086c-2456-11e3-b1d8-bfcbd6dd35cd,source_view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=10620186,node_list=( 447a086c-2456-11e3-b1d8-bfcbd6dd35cd,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=3,im_range=[4,3],}
e10fe36e-2559-11e3-8f39-661947538c85,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,00000000-0000-0000-0000-000000000000,0),safe_seq=-1,im_range=[-1,-1],}
)
},
}
e10fe36e-2559-11e3-8f39-661947538c85,evs::node{operational=1,suspected=0,installed=0,fifo_seq=-1,join_message=
evs::msg{version=0,type=4,user_type=255,order=1,seq=-1,seq_range=-1,aru_seq=-1,flags=0,source=e10fe36e-2559-11e3-8f39-661947538c85,source_view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=7949526,node_list=( 239e4407-0ce7-11e3-9bee-3be82ca03086,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[6,5],}
447a086c-2456-11e3-b1d8-bfcbd6dd35cd,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=3,im_range=[0,3],}
e10fe36e-2559-11e3-8f39-661947538c85,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[6,5],}
)
},
}
}
}130929 4:14:30 [ERROR] WSREP: exception from gcomm, backend must be restarted:nlself_i != same_view.end(): (FATAL)
at gcomm/src/evs_proto.cpp:handle_join():3530
130929 4:14:30 [Note] WSREP: Received self-leave message.
130929 4:14:30 [Note] WSREP: Flow-control interval: [0, 0]
130929 4:14:30 [Note] WSREP: Received SELF-LEAVE. Closing connection.
130929 4:14:30 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 27410502)
130929 4:14:30 [Note] WSREP: RECV thread exiting 0: Success
130929 4:14:30 [Note] WSREP: New cluster view: global state: 73a23f83-0c81-11e3-9fc4-a2a855d3a912:27410502, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 2
130929 4:14:30 [Note] WSREP: Setting wsrep_ready to 0
130929 4:14:30 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130929 4:14:30 [Note] WSREP: applier thread exiting (code:0)
130929 4:14:30 [Note] WSREP: closing applier 6
dbe0:
130928 23:14:30 [Note] WSREP: evs::proto(239e4407-0ce7-11e3-9bee-3be82ca03086, GATHER, view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94)) suspecting node: 447a086c-2456-11e3-b1d8-bfcbd6dd35cd
130928 23:14:30 [ERROR] WSREP: exception caused by message: evs::msg{version=0,type=4,user_type=255,order=1,seq=3,seq_range=-1,aru_seq=3,flags=4,source=447a086c-2456-11e3-b1d8-bfcbd6dd35cd,source_view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),range_uuid=000
00000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=10620186,node_list=( 447a086c-2456-11e3-b1d8-bfcbd6dd35cd,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=3,im_range=[4,3],}
e10fe36e-2559-11e3-8f39-661947538c85,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,00000000-0000-0000-0000-000000000000,0),safe_seq=-1,im_range=[-1,-1],}
)
}
state after handling message: evs::proto(evs::proto(239e4407-0ce7-11e3-9bee-3be82ca03086, GATHER, view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94)), GATHER) {
current_view=view(view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94) memb {
239e4407-0ce7-11e3-9bee-3be82ca03086,
447a086c-2456-11e3-b1d8-bfcbd6dd35cd,
e10fe36e-2559-11e3-8f39-661947538c85,
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=-1,safe_seq=-1,node_index=node: {idx=0,range=[6,5],safe_seq=-1} node: {idx=1,range=[0,3],safe_seq=3} node: {idx=2,range=[6,5],safe_seq=-1} },
fifo_seq=66278530,
last_sent=5,
known={
239e4407-0ce7-11e3-9bee-3be82ca03086,evs::node{operational=1,suspected=0,installed=0,fifo_seq=-1,join_message=
evs::msg{version=0,type=4,user_type=255,order=1,seq=-1,seq_range=-1,aru_seq=-1,flags=0,source=239e4407-0ce7-11e3-9bee-3be82ca03086,source_view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=6
6278530,node_list=( 239e4407-0ce7-11e3-9bee-3be82ca03086,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[6,5],}
447a086c-2456-11e3-b1d8-bfcbd6dd35cd,node: {operational=1,suspected=1,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=3,im_range=[0,3],}
e10fe36e-2559-11e3-8f39-661947538c85,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[6,5],}
)
},
}
447a086c-2456-11e3-b1d8-bfcbd6dd35cd,evs::node{operational=1,suspected=1,installed=0,fifo_seq=10620186,join_message=
evs::msg{version=0,type=4,user_type=255,order=1,seq=3,seq_range=-1,aru_seq=3,flags=4,source=447a086c-2456-11e3-b1d8-bfcbd6dd35cd,source_view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=106
20186,node_list=( 447a086c-2456-11e3-b1d8-bfcbd6dd35cd,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=3,im_range=[4,3],}
e10fe36e-2559-11e3-8f39-661947538c85,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,00000000-0000-0000-0000-000000000000,0),safe_seq=-1,im_range=[-1,-1],}
)
},
}
e10fe36e-2559-11e3-8f39-661947538c85,evs::node{operational=1,suspected=0,installed=0,fifo_seq=7949526,join_message=
evs::msg{version=0,type=4,user_type=255,order=1,seq=-1,seq_range=-1,aru_seq=-1,flags=4,source=e10fe36e-2559-11e3-8f39-661947538c85,source_view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),range_uuid=00000000-0000-0000-0000-000000000000,range=[-1,-1],fifo_seq=7
949526,node_list=( 239e4407-0ce7-11e3-9bee-3be82ca03086,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[6,5],}
447a086c-2456-11e3-b1d8-bfcbd6dd35cd,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=3,im_range=[0,3],}
e10fe36e-2559-11e3-8f39-661947538c85,node: {operational=1,suspected=0,leave_seq=-1,view_id=view_id(REG,239e4407-0ce7-11e3-9bee-3be82ca03086,94),safe_seq=-1,im_range=[6,5],}
)
},
}
}
}130928 23:14:30 [ERROR] WSREP: exception from gcomm, backend must be restarted:nlself_i != same_view.end(): (FATAL)
at gcomm/src/evs_proto.cpp:handle_join():3530
130928 23:14:30 [Note] WSREP: Received self-leave message.
130928 23:14:30 [Note] WSREP: Flow-control interval: [0, 0]
130928 23:14:30 [Note] WSREP: Received SELF-LEAVE. Closing connection.
130928 23:14:30 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 27410502)
130928 23:14:30 [Note] WSREP: RECV thread exiting 0: Success
130928 23:14:30 [Note] WSREP: New cluster view: global state: 73a23f83-0c81-11e3-9fc4-a2a855d3a912:27410502, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 2
130928 23:14:30 [Note] WSREP: Setting wsrep_ready to 0
130928 23:14:30 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
130928 23:14:30 [Note] WSREP: applier thread exiting (code:0)
130928 23:14:30 [Note] WSREP: closing applier 10
Hi Jolan,
Could you provide a bit more context about what happened just before crashes. How long were nodes have been running, was some of the nodes restarted recently, was there any signs of network issues etc.
If there was some activity in error logs just before the crashes (within one minute or so), it would be interesting to see it too.