All nodes but one closes down
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL patches by Codership | Status tracked in 5.6 | |||||
5.5 |
Confirmed
|
Low
|
Unassigned | |||
5.6 |
Confirmed
|
Low
|
Unassigned | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Fix Committed
|
Medium
|
Unassigned | |||
5.6 |
Fix Committed
|
Medium
|
Unassigned |
Bug Description
A few times in a week our cluster experiences something like a "last man standing" situation, where all nodes except the one where the delete query is originating from is closed down
The situation only seems to be happening during delete queries and here's the error:
131010 1:56:55 [ERROR] Slave SQL: Could not execute Delete_rows event on table mytaste_
131010 1:56:55 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 120, 1408063004
131010 1:56:55 [ERROR] WSREP: Failed to apply trx: source: d2d40980-
131010 1:56:55 [ERROR] WSREP: Failed to apply app buffer: seqno: 1408063004, status: WSREP_FATAL
at galera/
at galera/
131010 1:56:55 [ERROR] WSREP: Node consistency compromized, aborting...
131010 1:56:55 [Note] WSREP: Closing send monitor...
131010 1:56:55 [Note] WSREP: Closed send monitor.
131010 1:56:55 [Note] WSREP: gcomm: terminating thread
131010 1:56:55 [Note] WSREP: gcomm: joining thread
131010 1:56:55 [Note] WSREP: gcomm: closing backend
I don't see why a node would go offline due to inconsistency during a delete, if a row doesn't exist, just continue? The row is supposed to be gone anyway. On update queries the node should obviously go offline, but delete queries?
I also noted that when this happens, the wsrep_notify_cmd isn't executed on the node that goes offline.
description: | updated |
I would agree about DELETE, and in general inconsistency policy should be configurable.
Having said that, such inconsistency is likely to be a result of a bug and not isolated, so ignoring this error on DELETE may not buy you much operational time. Upgrading to the latest release is more likely to solve your problem.