trx replaying failed due to interrupted BF applier

Bug #928150 reported by Teemu Ollakka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
Fix Released
Medium
Seppo Jaakola
5.5
Fix Released
Medium
Seppo Jaakola

Bug Description

Happened during seesaw test with sqlgen --rollbacks 0.1 --ac-frac 100 (otherwise default params). Server had wsrep_causal_reads=1, wsrep_slave_appliers=4, wsrep_sst_method=rsync (otherwise default demo config).

Log from crashed node:

120207 10:33:02 [Note] WSREP: New cluster view: global state: 90c349b1-5164-11e1-0800-825bd4d7ea75:137146, view# 16: Primary, number of nodes: 3, my index: 1, protocol version 1
120207 10:33:02 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
120207 10:33:02 [Note] WSREP: Assign initial position for certification: 137146, protocol version: 2
120207 10:33:32 [Note] WSREP: Member 0 (test2) synced with group.
120207 10:34:36 [Note] WSREP, BF applier interrupted in log_event.cc
120207 10:34:36 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 1053, 187540
120207 10:34:36 [Warning] WSREP: failed to replay trx: source: 320d20f9-5166-11e1-0800-99a5c68a70ee version: 2 local: 1 state: REPLAYING flags: 1 conn_id: 24 trx_id: 32507752 seqnos (l: 71268, g: 187540, s: 187519, d: 187539, ts: 1328603676461567159)
120207 10:34:36 [Warning] WSREP: Failed to apply app buffer: ��0O, seqno: 187540, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():51
         at galera/src/replicator_smm.cpp:apply_trx_ws():83
         at galera/src/replicator_smm.cpp:apply_trx_ws():122
         at galera/src/replicator_smm.cpp:replay_trx():789
120207 10:34:36 [ERROR] WSREP: trx_replay failed for: 5, query: REPLACE INTO comm00 SELECT * FROM comm00 WHERE p = 3
120207 10:34:36 [ERROR] Aborting

Changed in codership-mysql:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Seppo Jaakola (seppo-jaakola)
milestone: none → 5.5.20-23.4
Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :

Similar thing happened during sqlgen test run with parameters --rows 1000 --ac-frac 10 --rollbacks 0.1 --alters 0.001. Although not replaying error, it looks quite similar:

120210 12:42:58 [Note] WSREP, BF applier interrupted in log_event.cc
120210 12:42:58 [ERROR] Slave SQL: Error executing row event: 'Table 'test.comm03' doesn't exist', Error_code: 1053
120210 12:42:58 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 1053, 2651566
120210 12:42:58 [ERROR] WSREP: Failed to apply trx: source: f9892e24-53d2-11e1-0800-303af54123ea version: 2 local: 0 state: CERTIFYING flags: 1 conn_id: 1182 trx_id: 61418743 seqnos (l: 15087, g: 2651566, s: 2651552, d: 2651557, ts: 1328870577185841502)
120210 12:42:58 [ERROR] WSREP: Failed to apply app buffer: <B1><F4>4O^S, seqno: 2651566, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():51
         at galera/src/replicator_smm.cpp:apply_trx_ws():83
         at galera/src/replicator_smm.cpp:apply_trx_ws():122
         at galera/src/replicator_smm.cpp:apply_trx():443
         at galera/src/replicator_smm.cpp:process_trx():992

Revision history for this message
Teemu Ollakka (teemu-ollakka) wrote :

Seems that applying failure above was result of having "replicator.commit_order=1" in wsrep_provider_options, problem goes away with strict commit order.

Revision history for this message
Seppo Jaakola (seppo-jaakola) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.