We ran in to this bug (or something similar) in production this morning. The total effect was to bring the entire cluster down, since (as you'll see) two nodes shut themselves down and the third got a Signal 11 2-3 seconds later. There weren't any signs of performance problems on any of the nodes at the time, and the cluster had been stable for quite some time until now.
My only thought here is that Jira could be setting foreign_key_checks=0 before doing some internal maintenance, but that doesn't seem likely. Jira was configured at the time to point to an haproxy listener that spreads writes between all three servers. It no longer is.
Here's the last delete from AO_E8B6CC_MESSAGE_TAG that appears in the binary logs before the crashes:
# at 279619602
#140712 7:40:28 server id 3298096439 end_log_pos 279619650 CRC32 0xf39f5ed5 GTID [commit=yes]
SET @@SESSION.GTID_NEXT= 'b45609b2-0839-ee1c-76b7-60c776c75110:115691438'/*!*/;
# at 279619650
#140712 7:40:28 server id 3298096439 end_log_pos 279619727 CRC32 0x6b3f9354 Query thread_id=24117450 exec_time=0 error_code=0
SET TIMESTAMP=1405168828/*!*/;
SET @@session.sql_mode=539099136/*!*/;
SET @@session.auto_increment_increment=3, @@session.auto_increment_offset=2/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=192/*!*/;
BEGIN
/*!*/;
# at 279619727
#140712 7:40:28 server id 3298096439 end_log_pos 279619795 CRC32 0x2af4292a Table_map: `jira`.`AO_E8B6CC_MESSAGE_TAG` mapped to number 169
# at 279619795
#140712 7:40:28 server id 3298096439 end_log_pos 279619854 CRC32 0x13b86571 Delete_rows: table id 169 flags: STMT_END_F
### DELETE FROM `jira`.`AO_E8B6CC_MESSAGE_TAG`
### WHERE
### @1=70097
### @2=32804
### @3='audit-id-47893'
# at 279619854
#140712 7:40:28 server id 3298096439 end_log_pos 279619885 CRC32 0x73cd7fe6 Xid = 115701629
COMMIT/*!*/;
# at 279619885
#140712 7:40:28 server id 3298096439 end_log_pos 279619933 CRC32 0xb2a3ecec GTID [commit=yes]
SET @@SESSION.GTID_NEXT= 'b45609b2-0839-ee1c-76b7-60c776c75110:115691439'/*!*/;
We ran in to this bug (or something similar) in production this morning. The total effect was to bring the entire cluster down, since (as you'll see) two nodes shut themselves down and the third got a Signal 11 2-3 seconds later. There weren't any signs of performance problems on any of the nodes at the time, and the cluster had been stable for quite some time until now.
My only thought here is that Jira could be setting foreign_ key_checks= 0 before doing some internal maintenance, but that doesn't seem likely. Jira was configured at the time to point to an haproxy listener that spreads writes between all three servers. It no longer is.
Two nodes threw this same error, or very similar:
2014-07-12 07:40:30 13362 [ERROR] Slave SQL: Could not execute Delete_rows event on table jira.AO_ E8B6CC_ MESSAGE; Cannot delete or update a parent row: a foreign key constraint fails (`jira` .`AO_E8B6CC_ MESSAGE_ TAG`, CONSTRAINT `fk_ao_ e8b6cc_ message_ tag_message_ id` FOREIGN KEY (`MESSAGE_ID`) REFERENCES `AO_E8B6CC_MESSAGE` (`ID`)), Error_code: 1451; handler error HA_ERR_ ROW_IS_ REFERENCED; the event's master log FIRST, end_log_pos 541, Error_code: 1451 fa74-11e3- 94e2-7f5202c0e1 0c version: 3 local: 0 state: APPLYING flags: 1 conn_id: 24117450 trx_id: 77362570549 seqnos (l: 116300250, g: 115701779, s: 115701751, d: 115701766, ts: 2272383580846541)
2014-07-12 07:40:30 13362 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 115701779
2014-07-12 07:40:30 13362 [ERROR] WSREP: Failed to apply trx: source: ca04d333-
One gave me: /gist.github. com/gleamicus/ 820f7b195c04972 0ea71
https:/
Another gave me: /gist.github. com/gleamicus/ 4f56bc7938e6b87 2ebf1
https:/
The third node, however, saw things happen like this: /gist.github. com/gleamicus/ 87cccd694312ce1 286c5
https:/
-----------
Here's the last delete from AO_E8B6CC_ MESSAGE_ TAG that appears in the binary logs before the crashes:
# at 279619602 GTID_NEXT= 'b45609b2- 0839-ee1c- 76b7-60c776c751 10:115691438' /*!*/; 1405168828/ *!*/; sql_mode= 539099136/ *!*/; auto_increment_ increment= 3, @@session. auto_increment_ offset= 2/*!*/; character_ set_client= 33,@@session. collation_ connection= 33,@@session. collation_ server= 192/*!* /; `AO_E8B6CC_ MESSAGE_ TAG` mapped to number 169 `AO_E8B6CC_ MESSAGE_ TAG` GTID_NEXT= 'b45609b2- 0839-ee1c- 76b7-60c776c751 10:115691439' /*!*/;
#140712 7:40:28 server id 3298096439 end_log_pos 279619650 CRC32 0xf39f5ed5 GTID [commit=yes]
SET @@SESSION.
# at 279619650
#140712 7:40:28 server id 3298096439 end_log_pos 279619727 CRC32 0x6b3f9354 Query thread_id=24117450 exec_time=0 error_code=0
SET TIMESTAMP=
SET @@session.
SET @@session.
/*!\C utf8 *//*!*/;
SET @@session.
BEGIN
/*!*/;
# at 279619727
#140712 7:40:28 server id 3298096439 end_log_pos 279619795 CRC32 0x2af4292a Table_map: `jira`.
# at 279619795
#140712 7:40:28 server id 3298096439 end_log_pos 279619854 CRC32 0x13b86571 Delete_rows: table id 169 flags: STMT_END_F
### DELETE FROM `jira`.
### WHERE
### @1=70097
### @2=32804
### @3='audit-id-47893'
# at 279619854
#140712 7:40:28 server id 3298096439 end_log_pos 279619885 CRC32 0x73cd7fe6 Xid = 115701629
COMMIT/*!*/;
# at 279619885
#140712 7:40:28 server id 3298096439 end_log_pos 279619933 CRC32 0xb2a3ecec GTID [commit=yes]
SET @@SESSION.
Table structures:
CREATE TABLE `AO_E8B6CC_ MESSAGE_ TAG` ( ao_e8b6cc_ mes1391090780` (`MESSAGE_ID`), e8b6cc_ message_ tag_message_ id` FOREIGN KEY (`MESSAGE_ID`) REFERENCES `AO_E8B6CC_MESSAGE` (`ID`)
`ID` int(11) NOT NULL AUTO_INCREMENT,
`MESSAGE_ID` int(11) DEFAULT NULL,
`TAG` varchar(255) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `index_
CONSTRAINT `fk_ao_
) TYPE=InnoDB
CREATE TABLE `AO_E8B6CC_MESSAGE` (
`ADDRESS` varchar(255) NOT NULL,
`ID` int(11) NOT NULL AUTO_INCREMENT,
`PAYLOAD` longtext NOT NULL,
`PAYLOAD_TYPE` varchar(255) NOT NULL,
`PRIORITY` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
) TYPE=InnoDB