Joining node connects and crashed under load | Assertion `meta->gtid.seqno == wsrep_thd_trx_seqno(thd)' failed.
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Galera | Status tracked in 3.x | |||||
2.x |
Invalid
|
Undecided
|
Unassigned | |||
3.x |
Fix Released
|
High
|
Teemu Ollakka | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Invalid
|
Undecided
|
Unassigned | |||
5.6 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
We have stopped one node in a clean cluster, started an import and after a while we restarted the node again during import. This node joined the cluster and crashes after a short time.
We try to reproduce with core dump enabled.
} partitioned {
})
140221 13:29:45 [Note] WSREP: gcomm: connected
140221 13:29:45 [Note] WSREP: Changing maximum packet size to 64500,
resulting msg size: 32636
140221 13:29:45 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
140221 13:29:45 [Note] WSREP: Opened channel 'Galera Dev Cluster'
140221 13:29:45 [Note] WSREP: New COMPONENT: primary = yes,
bootstrap = no, my_idx = 0, memb_num = 3
140221 13:29:45 [Note] WSREP: Waiting for SST to complete.
140221 13:29:45 [Note] WSREP: STATE_EXCHANGE: sent state UUID:
3a734d78-
140221 13:29:45 [Note] WSREP: STATE EXCHANGE: sent state msg:
3a734d78-
140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
3a734d78-
140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
3a734d78-
140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
3a734d78-
140221 13:29:45 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 10,
members = 2/3 (joined/total),
act_id = 175711,
last_appl. = -1,
protocols = 0/5/2 (gcs/repl/appl),
group UUID = 646d078b-
140221 13:29:45 [Note] WSREP: Flow-control interval: [28, 28]
140221 13:29:45 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 175711)
140221 13:29:45 [Note] WSREP: State transfer required:
Group state: 646d078b-
Local state: 646d078b-
140221 13:29:45 [Note] WSREP: New cluster view: global state:
646d078b-
number of nodes: 3, my index: 0, protocol version 2
140221 13:29:45 [Warning] WSREP: Gap in state sequence. Need state
transfer.
140221 13:29:47 [Note] WSREP: Running: 'wsrep_sst_rsync --role
'joiner' --address '10.10.17.151' --auth 'sst:secret' --datadir
'/var/lib/
parent '5892''
140221 13:29:47 [Note] WSREP: Prepared SST request:
rsync|10.
140221 13:29:47 [Note] WSREP: wsrep_notify_cmd is not defined,
skipping notification.
140221 13:29:47 [Note] WSREP: REPL Protocols: 5 (3, 1)
140221 13:29:47 [Note] WSREP: Assign initial position for
certification: 175711, protocol version: 3
140221 13:29:47 [Note] WSREP: Service thread queue flushed.
140221 13:29:47 [Note] WSREP: Prepared IST receiver, listening at:
tcp://10.
140221 13:29:47 [Note] WSREP: Node 0.0 (Node C) requested state
transfer from '*any*'. Selected 1.0 (Node B)(SYNCED) as donor.
140221 13:29:47 [Note] WSREP: Shifting PRIMARY -> JOINER (TO:
175711)
140221 13:29:47 [Note] WSREP: Requesting state transfer: success,
donor: 1
WSREP_SST: [INFO] Joiner cleanup. (20140221 13:29:48.482)
WSREP_SST: [INFO] Joiner cleanup done. (20140221 13:29:48.988)
140221 13:29:48 [Note] WSREP: SST complete, seqno: 156159
140221 13:29:48 [Note] Plugin 'FEDERATED' is disabled.
140221 13:29:48 InnoDB: The InnoDB memory heap is disabled
140221 13:29:48 InnoDB: Mutexes and rw_locks use GCC atomic builtins
140221 13:29:48 InnoDB: Compressed tables use zlib 1.2.3.3
140221 13:29:48 InnoDB: Using Linux native AIO
140221 13:29:49 InnoDB: Initializing buffer pool, size = 48.0G
140221 13:29:51 InnoDB: Completed initialization of buffer pool
140221 13:29:51 InnoDB: highest supported file format is Barracuda.
140221 13:29:54 InnoDB: Waiting for the background threads to start
140221 13:29:55 InnoDB: 5.5.34 started; log sequence number
90436392703
140221 13:29:55 [Note] Server hostname (bind-address): '0.0.0.0';
port: 3306
140221 13:29:55 [Note] - '0.0.0.0' resolves to '0.0.0.0';
140221 13:29:55 [Note] Server socket created on IP: '0.0.0.0'.
140221 13:29:55 [Note] Event Scheduler: Loaded 0 events
140221 13:29:55 [Note] WSREP: Signalling provider to continue.
140221 13:29:55 [Note] WSREP: SST received: 646d078b-
93cfe86c89f3:156159
140221 13:29:55 [Note] WSREP: Receiving IST: 19552 writesets, seqnos
156159-175711
140221 13:29:55 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.34-log' socket: '/var/lib/
port: 3306 MySQL Community Server (GPL), wsrep_25.9.r3928
140221 13:29:55 [Warning] IP address '10.10.0.203' could not be
resolved: Name or service not known
140221 13:29:58 [Warning] IP address '10.10.17.24' could not be
resolved: Name or service not known
mysqld: /tmp/mysql-
wsrep_cb_status_t wsrep_commit_
wsrep_trx_meta_t*, wsrep_bool_t*, bool): Assertion `meta->gtid.seqno
== wsrep_thd_
13:30:03 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this
binary
or one of the libraries it was linked against is corrupt, improperly
built,
or misconfigured. This error can also be caused by malfunctioning
hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
key_buffer_
read_buffer_
max_used_
max_threads=1024
thread_count=66
connection_count=66
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_
= 2248680 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f442000d9f0
Attempting backtrace. You can use the following information to find
out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f44f4d8be68 thread_stack 0x40000
/usr/sbin/
/usr/sbin/
/lib/x86_
/lib/x86_
/lib/x86_
/lib/x86_
/lib/x86_
/usr/sbin/
[0x67050c]
/usr/lib/
EPvPNS_
/usr/lib/
Pv+0x24e)
/usr/lib/
cvEPv+0x308)
/usr/lib/
/usr/sbin/
/usr/sbin/
/lib/x86_
/lib/x86_
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 65
Status: NOT_KILLED
The manual page at http://
contains
information that should help you find out what is causing the crash.
140221 13:30:03 mysqld_safe Number of processes running now: 0
140221 13:30:03 mysqld_safe WSREP: not restarting wsrep node
automatically
140221 13:30:03 mysqld_safe mysqld from pid file
/var/lib/
show global variables like '%version%';
+------
| Variable_name | Value |
+------
| innodb_version | 5.5.34 |
| version | 5.5.34-log |
| version_comment | MySQL Community Server (GPL),wsrep_
| version_
| version_compile_os | Linux |
+------
show global status like '%version%';
+------
| Variable_name | Value |
+------
| wsrep_protocol_
| wsrep_provider_
+------
show global variables like '%wsrep_pro%';
wsrep_provider,
wsrep_provider_
cert.log_conflicts = no; evs.causal_
evs.debug_log_mask = 0x1; evs.inactive_
evs.inactive_
evs.install_timeout = PT15S; evs.join_
evs.keepalive_
evs.send_window = 4; evs.stats_
evs.suspect_timeout = PT5S; evs.use_aggregate = true;
evs.user_
PT5M; gcache.dir = /var/lib/
0; gcache.mem_size = 0; gcache.name =
/var/lib/
gcache.size = 8G; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit
= 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500;
gcs.max_throttle = 0.25; gcs.recv_
922337203685477
NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ;
gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment =
0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr =
10.10.9.76; pc.checksum = false; pc.ignore_quorum = false;
pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version
= 0; pc.weight = 1; protonet.backend = asio; protonet.version = 0;
repl.causal_
repl.key_format = FLAT8; repl.proto_max = 5; socket.checksum = 2"
Related branches
- David Bennett: Pending requested
- Diff: 0 lines
summary: |
- Joining node connects and crashed under load + Joining node connects and crashed under load | Assertion + `meta->gtid.seqno == wsrep_thd_trx_seqno(thd)' failed. |
affects: | codership-mysql → galera |
I am also experiencing a crash during IST, below are the relevant parts from the log:
2014-02-23 12:49:05 11154 [Note] WSREP: Receiving IST: 134022 writesets, seqnos 175819169-175953191 mysql/mysql. sock' port: 3306 Percona XtraDB Cluster (GPL), Release 25.4, Revision 731, wsrep_25.4.r4043 percona- xtradb- cluster- 5.6-rpms/ label_exp/ centos6- 64/target/ BUILD/Percona- XtraDB- Cluster- 5.6.15/ sql/wsrep_ applier. cc:321: wsrep_cb_status_t wsrep_commit_ cb(void* , uint32_t, const wsrep_trx_meta_t*, wsrep_bool_t*, bool): Assertion `meta->gtid.seqno = trx_seqno( thd)' failed. /bugs.launchpad .net/percona- xtradb- cluster
2014-02-23 12:49:05 11154 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.15-56-log' socket: '/var/lib/
mysqld: /mnt/workspace/
= wsrep_thd_
17:49:31 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https:/
key_buffer_ size=8388608 size=131072 connections= 67 size)*max_ threads = 407967 K bytes of memory
read_buffer_
max_used_
max_threads=1002
thread_count=65
connection_count=65
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fc090000990 mysqld( my_print_ stacktrace+ 0x35)[0x902445] mysqld( handle_ fatal_signal+ 0x4c4)[ 0x680114] libpthread. so.0(+0xfae0) [0x7fd5ac5caae0 ] libc.so. 6(gsignal+ 0x35)[0x7fd5aaa 3cba5] libc.so. 6(abort+ 0x17b)[ 0x7fd5aaa3e4bb] libc.so. 6(+0x2d5ce) [0x7fd5aaa355ce ] libc.so. 6(+0x2d672) [0x7fd5aaa35672 ] mysqld[ 0x5bcd4c] libgalera_ smm.so( _ZN6galera13Rep licatorSMM9appl y_trxEPvPNS_ 9TrxHandleE+ 0x552)[ 0x7fd5a8676052] libgalera_ smm.so( _ZN6galera13Rep licatorSMM8recv _ISTEPv+ 0x322)[ 0x7fd5a867e8d2] libgalera_ smm.so( _ZN6galera13Rep licatorSMM10asy nc_recvEPv+ 0x33f)[ 0x7fd5a8670c2f] libgalera_ smm.so( galera_ recv+0x23) [0x7fd5a8685673 ] mysqld[ 0x5be0af] mysqld( start_wsrep_ THD+0x480) [0x5ae4d0] libpthread. so.0(+0x7ddb) [0x7fd5ac5c2ddb ] libc.so. 6(clone+ 0x6d)[0x7fd5aaa eaa1d]
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc098944d88 thread_stack 0x40000
/usr/sbin/
/usr/sbin/
/lib64/
/lib64/
/lib64/
/lib64/
/lib64/
/usr/sbin/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/lib64/
/usr/sbin/
/usr/sbin/
/lib64/
/lib64/
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 10
Status: NOT_KILLED