MSR replication - Galera clash
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC |
New
|
Undecided
|
Unassigned | ||
5.7 |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
Setting up multi-source replication on a PXC node results in permanent 'System lock' state for second channel SQL thread.
Test case:
* setup 2 standalone MySQL 5.7 instances
* setup 1 standalone PXC 5.7 node, with wsrep provider enabled
* setup replication from both standalone MySQL instances, using two channels, to the PXC node
* restart the slave PXC node
The slave node gets permanently blocked - not able to stop it gracefully, nor kill the locked SQL thread.
Example results:
mysql> pager egrep "Running|
PAGER set to 'egrep "Running|
mysql> show slave status\G
Slave_
Last_
Last_
Slave_
Last_
Last_
2 rows in set (0.00 sec)
mysql> show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-
| 1 | system user | | NULL | Sleep | 617 | wsrep: applier idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 617 | wsrep: aborter idle | NULL | 0 | 0 |
| 3 | system user | | NULL | Connect | 616 | Waiting for master to send event | NULL | 0 | 0 |
| 4 | system user | | NULL | Connect | 616 | Slave has read all relay log; waiting for more updates | NULL | 0 | 0 |
| 5 | system user | | NULL | Connect | 616 | Waiting for master to send event | NULL | 0 | 0 |
| 6 | system user | | NULL | Connect | 616 | System lock | NULL | 0 | 0 |
| 9 | root | localhost | NULL | Query | 0 | starting | show processlist | 0 | 0 |
+----+-
7 rows in set (0.00 sec)
mysql> SELECT * FROM performance_
*******
COUNT_RECEIVED_
LAST_HEARTBEAT
RECEIVED_
LAST_
*******
COUNT_RECEIVED_
LAST_HEARTBEAT
RECEIVED_
LAST_
2 rows in set (0.00 sec)
mysql> SELECT * FROM performance_
*******
PROCESSLIS
PROCESSLIST_
PROCESSLIST_
PROCESSLIS
PROCESSLIST_
PROCESSLIST_
PROCESSLIST_
PROCESSLIST_
PARENT_
CONNECTION_
*******
PROCESSLIS
PROCESSLIST_
PROCESSLIST_
PROCESSLIS
PROCESSLIST_
PROCESSLIST_
PROCESSLIST_
PROCESSLIST_
PARENT_
CONNECTION_
2 rows in set (0.00 sec)
mysql> show status like 'ws%';
...
| wsrep_local_
...
| wsrep_cluster_
...
| wsrep_provider_
| wsrep_ready | ON |
+------
60 rows in set (0.00 sec)
mysql> stop slave for channel 'c3-c1';
Query OK, 0 rows affected (0.00 sec)
mysql> stop slave for channel 'c3-c2';
... hangs
Tried to repro this on 5.7.16 and it looks ok. Are the repl users on the channels the same or different?