applier fails with lock wait timeout exceeded on rsync SST donor
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Galera | Status tracked in 3.x | |||||
2.x |
Fix Released
|
High
|
Teemu Ollakka | |||
3.x |
Fix Released
|
High
|
Teemu Ollakka | |||
MySQL patches by Codership | Status tracked in 5.6 | |||||
5.6 |
Fix Released
|
High
|
Teemu Ollakka | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Fix Released
|
Undecided
|
Unassigned | |||
5.6 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Test: Standard seesaw test with rsync SST.
During one full SST donor got stuck in " Flushing tables for SST...". Normally this stage is resolved when some client transaction lock wait times out and FTWRL can proceed. However, this time applier (or replayer) could not apply events. The question is, why would the BF transaction ever had to wait for lock until it times out.
Log snippet:
2013-11-27 02:08:26 32076 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 16875835)
2013-11-27 02:08:26 32076 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2013-11-27 02:08:26 32076 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address 'gw:10023/
2013-11-27 02:08:26 32076 [Note] WSREP: sst_donor_thread signaled with 0
2013-11-27 02:08:26 32076 [Note] WSREP: Flushing tables for SST...
2013-11-27 02:09:17 32076 [Warning] Slave SQL: Could not execute Update_rows event on table test.comm01; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_
2013-11-27 02:09:17 32076 [Warning] WSREP: RBR event 3 Update_rows apply warning: 146, 16875911
2013-11-27 02:09:17 32076 [Warning] WSREP: Failed to apply app buffer: seqno: 16875911, status: 1
at galera/
Retrying 2th time
2013-11-27 02:10:08 32076 [Warning] Slave SQL: Could not execute Update_rows event on table test.comm01; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_
2013-11-27 02:10:08 32076 [Warning] WSREP: RBR event 3 Update_rows apply warning: 146, 16875911
2013-11-27 02:10:08 32076 [Warning] WSREP: Failed to apply app buffer: seqno: 16875911, status: 1
at galera/
Retrying 3th time
2013-11-27 02:10:59 32076 [Warning] Slave SQL: Could not execute Update_rows event on table test.comm01; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_
2013-11-27 02:10:59 32076 [Warning] WSREP: RBR event 3 Update_rows apply warning: 146, 16875911
2013-11-27 02:10:59 32076 [Warning] WSREP: Failed to apply app buffer: seqno: 16875911, status: 1
at galera/
Retrying 4th time
2013-11-27 02:11:50 32076 [Warning] Slave SQL: Could not execute Update_rows event on table test.comm01; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_
2013-11-27 02:11:50 32076 [Warning] WSREP: RBR event 3 Update_rows apply warning: 146, 16875911
2013-11-27 02:11:50 32076 [ERROR] WSREP: Failed to apply trx: source: fabb5d8b-
2013-11-27 02:11:50 32076 [ERROR] WSREP: Failed to apply trx 16875911 4 times
2013-11-27 02:11:50 32076 [ERROR] WSREP: Node consistency compromized, aborting...
It appeared that 5.6 branch is missing protection for BF threads against innodb lock wait time out.