Galera

Joining node connects and crashed under load | Assertion `meta->gtid.seqno == wsrep_thd_trx_seqno(thd)' failed.

Bug #1284803 reported by shinguz on 2014-02-25

This bug affects 4 people

	Status	Importance	Assigned to	Milestone
Galera	Status tracked in 3.x
2.x	Invalid	Undecided	Unassigned
3.x	Fix Released	High	Teemu Ollakka	Galera 25.3.5
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	Invalid	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC future-5.5
5.6	Fix Released	Undecided	Unassigned	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC galera-3.5

Bug Description

We have stopped one node in a clean cluster, started an import and after a while we restarted the node again during import. This node joined the cluster and crashes after a short time.

We try to reproduce with core dump enabled.

} partitioned {
})
140221 13:29:45 [Note] WSREP: gcomm: connected
140221 13:29:45 [Note] WSREP: Changing maximum packet size to 64500,
resulting msg size: 32636
140221 13:29:45 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
140221 13:29:45 [Note] WSREP: Opened channel 'Galera Dev Cluster'
140221 13:29:45 [Note] WSREP: New COMPONENT: primary = yes,
bootstrap = no, my_idx = 0, memb_num = 3
140221 13:29:45 [Note] WSREP: Waiting for SST to complete.
140221 13:29:45 [Note] WSREP: STATE_EXCHANGE: sent state UUID:
3a734d78-9afc-11e3-975c-c7405de7027b
140221 13:29:45 [Note] WSREP: STATE EXCHANGE: sent state msg:
3a734d78-9afc-11e3-975c-c7405de7027b
140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
3a734d78-9afc-11e3-975c-c7405de7027b from 0 (Node C)
140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
3a734d78-9afc-11e3-975c-c7405de7027b from 1 (Node B)
140221 13:29:45 [Note] WSREP: STATE EXCHANGE: got state msg:
3a734d78-9afc-11e3-975c-c7405de7027b from 2 (Node A)
140221 13:29:45 [Note] WSREP: Quorum results:
version = 3,
component = PRIMARY,
conf_id = 10,
members = 2/3 (joined/total),
act_id = 175711,
last_appl. = -1,
protocols = 0/5/2 (gcs/repl/appl),
group UUID = 646d078b-98a2-11e3-b7a6-93cfe86c89f3
140221 13:29:45 [Note] WSREP: Flow-control interval: [28, 28]
140221 13:29:45 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 175711)
140221 13:29:45 [Note] WSREP: State transfer required:
Group state: 646d078b-98a2-11e3-b7a6-93cfe86c89f3:175711
Local state: 646d078b-98a2-11e3-b7a6-93cfe86c89f3:156159
140221 13:29:45 [Note] WSREP: New cluster view: global state:
646d078b-98a2-11e3-b7a6-93cfe86c89f3:175711, view# 11: Primary,
number of nodes: 3, my index: 0, protocol version 2
140221 13:29:45 [Warning] WSREP: Gap in state sequence. Need state
transfer.
140221 13:29:47 [Note] WSREP: Running: 'wsrep_sst_rsync --role
'joiner' --address '10.10.17.151' --auth 'sst:secret' --datadir
'/var/lib/mysql/datadir/' --defaults-file '/etc/mysql/my.cnf' --
parent '5892''
140221 13:29:47 [Note] WSREP: Prepared SST request:
rsync|10.10.17.151:4444/rsync_sst
140221 13:29:47 [Note] WSREP: wsrep_notify_cmd is not defined,
skipping notification.
140221 13:29:47 [Note] WSREP: REPL Protocols: 5 (3, 1)
140221 13:29:47 [Note] WSREP: Assign initial position for
certification: 175711, protocol version: 3
140221 13:29:47 [Note] WSREP: Service thread queue flushed.
140221 13:29:47 [Note] WSREP: Prepared IST receiver, listening at:
tcp://10.10.17.151:4568
140221 13:29:47 [Note] WSREP: Node 0.0 (Node C) requested state
transfer from '*any*'. Selected 1.0 (Node B)(SYNCED) as donor.
140221 13:29:47 [Note] WSREP: Shifting PRIMARY -> JOINER (TO:
175711)
140221 13:29:47 [Note] WSREP: Requesting state transfer: success,
donor: 1
WSREP_SST: [INFO] Joiner cleanup. (20140221 13:29:48.482)
WSREP_SST: [INFO] Joiner cleanup done. (20140221 13:29:48.988)
140221 13:29:48 [Note] WSREP: SST complete, seqno: 156159
140221 13:29:48 [Note] Plugin 'FEDERATED' is disabled.
140221 13:29:48 InnoDB: The InnoDB memory heap is disabled
140221 13:29:48 InnoDB: Mutexes and rw_locks use GCC atomic builtins
140221 13:29:48 InnoDB: Compressed tables use zlib 1.2.3.3
140221 13:29:48 InnoDB: Using Linux native AIO
140221 13:29:49 InnoDB: Initializing buffer pool, size = 48.0G
140221 13:29:51 InnoDB: Completed initialization of buffer pool
140221 13:29:51 InnoDB: highest supported file format is Barracuda.
140221 13:29:54 InnoDB: Waiting for the background threads to start
140221 13:29:55 InnoDB: 5.5.34 started; log sequence number
90436392703
140221 13:29:55 [Note] Server hostname (bind-address): '0.0.0.0';
port: 3306
140221 13:29:55 [Note] - '0.0.0.0' resolves to '0.0.0.0';
140221 13:29:55 [Note] Server socket created on IP: '0.0.0.0'.
140221 13:29:55 [Note] Event Scheduler: Loaded 0 events
140221 13:29:55 [Note] WSREP: Signalling provider to continue.
140221 13:29:55 [Note] WSREP: SST received: 646d078b-98a2-11e3-b7a6-
93cfe86c89f3:156159
140221 13:29:55 [Note] WSREP: Receiving IST: 19552 writesets, seqnos
156159-175711
140221 13:29:55 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.34-log' socket: '/var/lib/mysql/datadir/mysql.sock'
port: 3306 MySQL Community Server (GPL), wsrep_25.9.r3928
140221 13:29:55 [Warning] IP address '10.10.0.203' could not be
resolved: Name or service not known
140221 13:29:58 [Warning] IP address '10.10.17.24' could not be
resolved: Name or service not known
mysqld: /tmp/mysql-5.5.34/sql/wsrep_applier.cc:309:
wsrep_cb_status_t wsrep_commit_cb(void*, uint32_t, const
wsrep_trx_meta_t*, wsrep_bool_t*, bool): Assertion `meta->gtid.seqno
== wsrep_thd_trx_seqno(thd)' failed.
13:30:03 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this
binary
or one of the libraries it was linked against is corrupt, improperly
built,
or misconfigured. This error can also be caused by malfunctioning
hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=67
max_threads=1024
thread_count=66
connection_count=66
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads
= 2248680 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f442000d9f0
Attempting backtrace. You can use the following information to find
out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f44f4d8be68 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x828ce5]
/usr/sbin/mysqld(handle_fatal_signal+0x403)[0x6ae413]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f53b8201cb0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x7f53b682f425]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x7f53b6832b8b]
/lib/x86_64-linux-gnu/libc.so.6(+0x2f0ee)[0x7f53b68280ee]
/lib/x86_64-linux-gnu/libc.so.6(+0x2f192)[0x7f53b6828192]
/usr/sbin/mysqld(_Z15wsrep_commit_cbPvjPK14wsrep_trx_metaPbb+0x1ec)
[0x67050c]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trx
EPvPNS_9TrxHandleE+0x110)[0x7f53b532def0]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM8recv_ISTE
Pv+0x24e)[0x7f53b533c0ee]
/usr/lib/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_re
cvEPv+0x308)[0x7f53b5330fa8]
/usr/lib/galera/libgalera_smm.so(galera_recv+0x23)[0x7f53b5340eb3]
/usr/sbin/mysqld[0x671542]
/usr/sbin/mysqld(start_wsrep_THD+0x3a9)[0x524ad9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f53b81f9e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f53b68ecccd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 65
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html
contains
information that should help you find out what is causing the crash.
140221 13:30:03 mysqld_safe Number of processes running now: 0
140221 13:30:03 mysqld_safe WSREP: not restarting wsrep node
automatically
140221 13:30:03 mysqld_safe mysqld from pid file
/var/lib/mysql/datadir/ip-10-10-17-151.pid ended

show global status like '%version%';
+------------------------+--------------+
| Variable_name | Value |
+------------------------+--------------+
| wsrep_protocol_version | 5 |
| wsrep_provider_version | 25.3.2(r170) |
+------------------------+--------------+

show global variables like '%wsrep_pro%';
wsrep_provider,/usr/lib/galera/libgalera_smm.so
wsrep_provider_options,"base_host = 10.10.9.76; base_port = 4567;
cert.log_conflicts = no; evs.causal_keepalive_period = PT1S;
evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S;
evs.inactive_timeout = PT15S; evs.info_log_mask = 0;
evs.install_timeout = PT15S; evs.join_retrans_period = PT1S;
evs.keepalive_period = PT1S; evs.max_install_timeouts = 1;
evs.send_window = 4; evs.stats_report_period = PT1M;
evs.suspect_timeout = PT5S; evs.use_aggregate = true;
evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout =
PT5M; gcache.dir = /var/lib/mysql/datadir/; gcache.keep_pages_size =
0; gcache.mem_size = 0; gcache.name =
/var/lib/mysql/datadir//galera.cache; gcache.page_size = 128M;
gcache.size = 8G; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit
= 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500;
gcs.max_throttle = 0.25; gcs.recv_q_hard_limit =
9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor =
NO; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ;
gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment =
0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr =
10.10.9.76; pc.checksum = false; pc.ignore_quorum = false;
pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version
= 0; pc.weight = 1; protonet.backend = asio; protonet.version = 0;
repl.causal_read_timeout = PT30S; repl.commit_order = 3;
repl.key_format = FLAT8; repl.proto_max = 5; socket.checksum = 2"

Tags:

Related branches

lp://staging/galera

Ready for review for merging into lp://staging/~dbpercona/galera/Bug1348714

David Bennett: Pending requested 2014-07-25

Raghavendra D Prabhu (raghavendra-prabhu) on 2014-03-15

summary:

- Joining node connects and crashed under load
+ Joining node connects and crashed under load | Assertion
+ `meta->gtid.seqno == wsrep_thd_trx_seqno(thd)' failed.

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-17:

I am also experiencing a crash during IST, below are the relevant parts from the log:

2014-02-23 12:49:05 11154 [Note] WSREP: Receiving IST: 134022 writesets, seqnos 175819169-175953191
2014-02-23 12:49:05 11154 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.15-56-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), Release 25.4, Revision 731, wsrep_25.4.r4043
mysqld: /mnt/workspace/percona-xtradb-cluster-5.6-rpms/label_exp/centos6-64/target/BUILD/Percona-XtraDB-Cluster-5.6.15/sql/wsrep_applier.cc:321: wsrep_cb_status_t wsrep_commit_cb(void*, uint32_t, const wsrep_trx_meta_t*, wsrep_bool_t*, bool): Assertion `meta->gtid.seqno =
= wsrep_thd_trx_seqno(thd)' failed.
17:49:31 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=67
max_threads=1002
thread_count=65
connection_count=65
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 407967 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fc090000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc098944d88 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x902445]
/usr/sbin/mysqld(handle_fatal_signal+0x4c4)[0x680114]
/lib64/libpthread.so.0(+0xfae0)[0x7fd5ac5caae0]
/lib64/libc.so.6(gsignal+0x35)[0x7fd5aaa3cba5]
/lib64/libc.so.6(abort+0x17b)[0x7fd5aaa3e4bb]
/lib64/libc.so.6(+0x2d5ce)[0x7fd5aaa355ce]
/lib64/libc.so.6(+0x2d672)[0x7fd5aaa35672]
/usr/sbin/mysqld[0x5bcd4c]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x552)[0x7fd5a8676052]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM8recv_ISTEPv+0x322)[0x7fd5a867e8d2]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x33f)[0x7fd5a8670c2f]
/usr/lib64/libgalera_smm.so(galera_recv+0x23)[0x7fd5a8685673]
/usr/sbin/mysqld[0x5be0af]
/usr/sbin/mysqld(start_wsrep_THD+0x480)[0x5ae4d0]
/lib64/libpthread.so.0(+0x7ddb)[0x7fd5ac5c2ddb]
/lib64/libc.so.6(clone+0x6d)[0x7fd5aaaeaa1d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 10
Status: NOT_KILLED

I am also experiencing a crash during IST, below are the relevant parts from the log:

2014-02-23 12:49:05 11154 [Note] WSREP: Receiving IST: 134022 writesets, seqnos 175819169-175953191
2014-02-23 12:49:05 11154 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.15-56-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release 25.4, Revision 731, wsrep_25.4.r4043
mysqld: /mnt/workspace/percona-xtradb-cluster-5.6-rpms/label_exp/centos6-64/target/BUILD/Percona-XtraDB-Cluster-5.6.15/sql/wsrep_applier.cc:321: wsrep_cb_status_t wsrep_commit_cb(void*, uint32_t, const wsrep_trx_meta_t*, wsrep_bool_t*, bool): Assertion `meta->gtid.seqno =
= wsrep_thd_trx_seqno(thd)' failed.
17:49:31 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=67
max_threads=1002
thread_count=65
connection_count=65
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 407967 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 10
Status: NOT_KILLED

tags:	added: i40190
tags:	added: issue-40190 removed: i40190

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-17:

gdb_std.txt Edit (25.2 KiB, text/plain)

Attaching backtrace generated with thread apply all bt

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-17:

gdb_full.txt Edit (77.9 KiB, text/plain)

Attaching backtrace generated with thread apply all bt full

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-17:

The packages installed are as follows:
Percona-XtraDB-Cluster-shared-56-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-client-56-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-galera-3-3.3-1.207.rhel6.x86_64
Percona-XtraDB-Cluster-56-debuginfo-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-server-56-5.6.15-25.4.731.rhel6.x86_64
Percona-XtraDB-Cluster-galera-3-debuginfo-3.3-1.207.rhel6.x86_64

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-17:

This crashing bug is really painful, because once it crashes during a IST, there is no way to bring the node back up using a IST and you have to resort to doing a SST.

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-03-17:

I tested with >10k writesets and didn't get a crash:

2014-03-17 20:08:22 29098 [Note] InnoDB: The InnoDB memory heap is disabled
2014-03-17 20:08:22 29098 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2014-03-17 20:08:22 29098 [Note] InnoDB: Compressed tables use zlib 1.2.8
2014-03-17 20:08:22 29098 [Note] InnoDB: Using Linux native AIO
2014-03-17 20:08:22 29098 [Note] InnoDB: Not using CPU crc32 instructions
2014-03-17 20:08:22 29098 [Note] InnoDB: Initializing buffer pool, size = 500.0M
2014-03-17 20:08:22 29098 [Note] InnoDB: Completed initialization of buffer pool
2014-03-17 20:08:22 29098 [Note] InnoDB: Highest supported file format is Barracuda.
2014-03-17 20:08:22 29098 [Note] InnoDB: 128 rollback segment(s) are active.
2014-03-17 20:08:22 29098 [Note] InnoDB: Waiting for purge to start
2014-03-17 20:08:22 29098 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.15-rel63.0 started; log sequence number 5485060234
2014-03-17 20:08:22 29098 [Note] WSREP: Initial TC log open: dummy
2014-03-17 20:08:22 29098 [Note] RSA private key file not found: /pxc56/datadir2//private_key.pem. Some authentication plugins will not work.
2014-03-17 20:08:22 29098 [Note] RSA public key file not found: /pxc56/datadir2//public_key.pem. Some authentication plugins will not work.
2014-03-17 20:08:22 29098 [Note] Server hostname (bind-address): '*'; port: 5000
2014-03-17 20:08:22 29098 [Note] IPv6 is not available.
2014-03-17 20:08:22 29098 [Note] - '0.0.0.0' resolves to '0.0.0.0';
2014-03-17 20:08:22 29098 [Note] Server socket created on IP: '0.0.0.0'.
2014-03-17 20:08:22 29098 [Note] Event Scheduler: Loaded 0 events
2014-03-17 20:08:22 29098 [Note] WSREP: Signalling provider to continue.
2014-03-17 20:08:22 29098 [Note] WSREP: inited wsrep sidno 1
2014-03-17 20:08:22 29098 [Note] WSREP: SST received: f7e31510-9958-11e3-82f8-abba51ecd1d8:496459
2014-03-17 20:08:22 29098 [Note] WSREP: Receiving IST: 10416 writesets, seqnos 496459-506875
2014-03-17 20:08:22 29098 [Note] /pxc56/bin/mysqld: ready for connections.
Version: '5.6.15-25.5' socket: '/pxc56/datadir2/pxc.sock' port: 5000 Percona XtraDB Cluster (GPL) 5.6.15-25.5, Revision 750, wsrep_25.5.r750
2014-03-17 20:08:53 29098 [Note] WSREP: IST received: f7e31510-9958-11e3-82f8-abba51ecd1d8:506875
2014-03-17 20:08:53 29098 [Note] WSREP: 0.0 (Arch1): State transfer to 1.0 (Arch2) complete.
2014-03-17 20:08:53 29098 [Note] WSREP: 1.0 (Arch2): State transfer from 0.0 (Arch1) complete.
2014-03-17 20:08:53 29098 [Note] WSREP: Shifting JOINER -> JOINED (TO: 511918)
2014-03-17 20:08:53 29098 [Note] WSREP: Member 0 (Arch1) synced with group.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Can you test with latest PXC56/Galera3?

You can get them here:

http://www.percona.com/downloads/TESTING/Percona-XtraDB-Cluster-galera-56/galera-3.x/211/RPM/rhel6/x86_64/

http://www.percona.com/downloads/TESTING/Percona-XtraDB-Cluster-56/5.6.15-25.5/5.6/748/RPM/rhel6/x86_64/

I tested with >10k writesets and didn't get a crash:

2014-03-17 20:08:22 29098 [Note] InnoDB: The InnoDB memory heap is disabled
2014-03-17 20:08:22 29098 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2014-03-17 20:08:22 29098 [Note] InnoDB: Compressed tables use zlib 1.2.8
2014-03-17 20:08:22 29098 [Note] InnoDB: Using Linux native AIO
2014-03-17 20:08:22 29098 [Note] InnoDB: Not using CPU crc32 instructions
2014-03-17 20:08:22 29098 [Note] InnoDB: Initializing buffer pool, size = 500.0M
2014-03-17 20:08:22 29098 [Note] InnoDB: Completed initialization of buffer pool
2014-03-17 20:08:22 29098 [Note] InnoDB: Highest supported file format is Barracuda.
2014-03-17 20:08:22 29098 [Note] InnoDB: 128 rollback segment(s) are active.
2014-03-17 20:08:22 29098 [Note] InnoDB: Waiting for purge to start
2014-03-17 20:08:22 29098 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.15-rel63.0 started; log sequence number 5485060234
2014-03-17 20:08:22 29098 [Note] WSREP: Initial TC log open: dummy
2014-03-17 20:08:22 29098 [Note] RSA private key file not found: /pxc56/datadir2//private_key.pem. Some authentication plugins will not work.
2014-03-17 20:08:22 29098 [Note] RSA public key file not found: /pxc56/datadir2//public_key.pem. Some authentication plugins will not work.
2014-03-17 20:08:22 29098 [Note] Server hostname (bind-address): '*'; port: 5000
2014-03-17 20:08:22 29098 [Note] IPv6 is not available.
2014-03-17 20:08:22 29098 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
2014-03-17 20:08:22 29098 [Note] Server socket created on IP: '0.0.0.0'.
2014-03-17 20:08:22 29098 [Note] Event Scheduler: Loaded 0 events
2014-03-17 20:08:22 29098 [Note] WSREP: Signalling provider to continue.
2014-03-17 20:08:22 29098 [Note] WSREP: inited wsrep sidno 1
2014-03-17 20:08:22 29098 [Note] WSREP: SST received: f7e31510-9958-11e3-82f8-abba51ecd1d8:496459
2014-03-17 20:08:22 29098 [Note] WSREP: Receiving IST: 10416 writesets, seqnos 496459-506875
2014-03-17 20:08:22 29098 [Note] /pxc56/bin/mysqld: ready for connections.
Version: '5.6.15-25.5'  socket: '/pxc56/datadir2/pxc.sock'  port: 5000  Percona XtraDB Cluster (GPL) 5.6.15-25.5, Revision 750, wsrep_25.5.r750
2014-03-17 20:08:53 29098 [Note] WSREP: IST received: f7e31510-9958-11e3-82f8-abba51ecd1d8:506875
2014-03-17 20:08:53 29098 [Note] WSREP: 0.0 (Arch1): State transfer to 1.0 (Arch2) complete.
2014-03-17 20:08:53 29098 [Note] WSREP: 1.0 (Arch2): State transfer from 0.0 (Arch1) complete.
2014-03-17 20:08:53 29098 [Note] WSREP: Shifting JOINER -> JOINED (TO: 511918)
2014-03-17 20:08:53 29098 [Note] WSREP: Member 0 (Arch1) synced with group.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Can you test with latest PXC56/Galera3?

You can get them here:

http://www.percona.com/downloads/TESTING/Percona-XtraDB-Cluster-galera-56/galera-3.x/211/RPM/rhel6/x86_64/

http://www.percona.com/downloads/TESTING/Percona-XtraDB-Cluster-56/5.6.15-25.5/5.6/748/RPM/rhel6/x86_64/

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-03-17:

Also, even though above traces are replicated on a production system, it would be good if it can be replicated on a clean environment (with sysbench tables etc.).

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-03-17:

Also, I used

sysbench --test=./oltp_update.lua --db-driver=mysql --mysql-engine-trx=yes --mysql-table-engine=innodb --mysql-socket=/pxc56/datadir/pxc.sock --mysql-user=root --mysql-password=test --num-threads=10 --init-rng=on --max-time=30000 --max-requests=300000 --oltp_index_updates=10 --oltp_non_index_updates=10 --oltp_tables_count=2 run

along with parallel_prepare earlier.

Gcache is default at 128M.

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-17:

May be codership can look at the backtraces first and then I can try to reproduce in my test environment. This problem is very much reproducible on the production cluster I am working on.

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2014-03-18:

#10

Hi Ovais, without contents of the THD object this stacktrace is of little use. Do you still have the core and could you print contents of the THD and meta involved into this assert?

thr 1
f 7
set print pretty
p *thd
p *meta

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-18:

#11

gdb_thr1.txt Edit (66.8 KiB, text/plain)

Hi Alex, I have attached the information you have requested except for print *meta which gives me:
(gdb) print *meta
value has been optimized out

For meta, does the attached file gdb_full.txt at frame 8 has enough information for you?

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-18:

#12

Actually this worked:

(gdb) thr 1
[Switching to thread 1 (Thread 0x7fa2c431c700 (LWP 4877))]#8 0x00007fa2c5dcd052 in galera::ReplicatorSMM::apply_trx (this=0x2f87ab0, recv_ctx=0x7f984c000990, trx=0x7f983800ab60) at galera/src/replicator_smm.cpp:436
436 true));
(gdb) fr 8
#8 0x00007fa2c5dcd052 in galera::ReplicatorSMM::apply_trx (this=0x2f87ab0, recv_ctx=0x7f984c000990, trx=0x7f983800ab60) at galera/src/replicator_smm.cpp:436
436 true));
(gdb) p meta
$6 = {
  gtid = {
    uuid = {
      data = "\203\340zo\226\370\021\343\270\364g\234\264\035\226", <incomplete sequence \371>
    },
    seqno = 728526526
  },
  depends_on = 728526438
}

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-18:

#13

gdb_thr1_fr8.txt Edit (6.8 KiB, text/plain)

Attached is the output of the following:
thr 1
f 8
p *trx

Teemu Ollakka (teemu-ollakka) on 2014-03-20

affects:

codership-mysql → galera

Revision history for this message

Teemu Ollakka (teemu-ollakka) wrote on 2014-03-21:

#14

Fix pushed in http://bazaar.launchpad.net/~codership/galera/3.x/revision/177

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-21:

#15

Raghu, if you could get me the RPMs for galera 3 with the fix, then I can test :)

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-03-24:

#16

I just wanted to confirm that the fix here fixes the crashing issue. So far there have been no IST crashes.

Revision history for this message

Matthew B (utdrmac) wrote on 2014-05-28:

#17

mysql error log Edit (36.7 KiB, text/plain)

This seems to affect a client of mine. See attached log file. No option for core dump. Running in production.

Revision history for this message

Ovais Tariq (ovais-tariq) wrote on 2014-06-23:

#18

@Matthew,

Can you try this Galera library with the fix:
http://www.percona.com/redir/downloads/TESTING/Percona-XtraDB-Cluster-galera-56/galera-3.x/214/RPM/rhel6/x86_64/Percona-XtraDB-Cluster-galera-3-3.4-1.214.rhel6.x86_64.rpm

Revision history for this message

Raghavendra D Prabhu (raghavendra-prabhu) wrote on 2014-06-24:

#19

Galera 3.5 has already been released. No need for testing rpms/debs.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

#20

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1630

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.