Bad network on one node takes a whole cluster down
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Galera | Status tracked in 3.x | |||||
2.x |
Fix Committed
|
Undecided
|
Yan Zhang | |||
3.x |
Fix Committed
|
Undecided
|
Yan Zhang | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Confirmed
|
Undecided
|
Unassigned | |||
5.6 |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I've made a test cluster of 10 VM nodes, all running within the same VM host.
During the tests I was not putting any load on any node. All nodes run on Percona XtraDB Cluster 5.5.34-55, wsrep_25.9.r3928, Galera 2.8(r165).
All evs. variables are default. Error logs for both tests attached.
When I introduce network latency and packet loss on *just one* of the nodes, it makes huge problems for the whole cluster like long periods of non-primary state on all nodes, through permanent non-primary state on all nodes to some nodes being stuck in false faith that cluster is full and operational while it's not.
This is how I introduced bad network on first node:
[root@percona1 ~]# tc qdisc change dev eth1 root netem loss 20% delay 150ms 20ms distribution normal
[root@percona1 ~]# tc qdisc show
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc netem 8003: dev eth1 root refcnt 2 limit 1000 delay 150.0ms 20.0ms loss 20%
In the first test I am just trying to restart another node while first node has unstable network:
Trying percona2 node restart at:
140129 16:10:02 [Note] /usr/sbin/mysqld: Normal shutdown
It failed to start again - see it's error log.
A while later, view from node6:
LefredPXC / percona6 / Galera 2.8(r165)
Wsrep Cluster Node Queue Ops Bytes Flow Conflct PApply Commit
time P cnf # cmt sta Up Dn Up Dn Up Dn pau snt lcf bfa dst oooe oool wind
16:10:35 P 38 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:10:38 P 38 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:10:42 P 38 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:10:45 N 615 3 Init F/T 0 0 0 1 0 256 0.0 0 0 0 0 0 0 0
16:10:48 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:10:51 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:10:54 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:10:57 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:00 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:03 N 615 1 Init F/T 0 0 0 2 0 256 0.0 0 0 0 0 0 0 0
16:11:06 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:09 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:12 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:15 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:18 N 615 1 Init F/T 0 0 0 1 0 128 0.0 0 0 0 0 0 0 0
LefredPXC / percona6 / Galera 2.8(r165)
Wsrep Cluster Node Queue Ops Bytes Flow Conflct PApply Commit
time P cnf # cmt sta Up Dn Up Dn Up Dn pau snt lcf bfa dst oooe oool wind
16:11:21 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:24 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:27 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:30 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:33 N 615 1 Init F/T 0 0 0 1 0 128 0.0 0 0 0 0 0 0 0
16:11:36 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:39 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:42 N 615 9 Init F/T 0 0 0 1 0 616 0.0 0 0 0 0 0 0 0
16:11:45 N 615 9 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:48 N 615 9 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:51 N 615 9 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:54 N 615 9 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:11:57 N 615 9 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:12:00 N 615 9 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:12:03 N 615 9 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
percona3 mysql> show status like 'ws%';
+------
| Variable_name | Value |
+------
| wsrep_local_
| wsrep_protocol_
| wsrep_last_
| wsrep_replicated | 0 |
| wsrep_replicate
| wsrep_received | 11 |
| wsrep_received_
| wsrep_local_commits | 0 |
| wsrep_local_
| wsrep_local_replays | 0 |
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_local_
| wsrep_flow_
| wsrep_flow_
| wsrep_flow_
| wsrep_cert_
| wsrep_apply_oooe | 0.000000 |
| wsrep_apply_oool | 0.000000 |
| wsrep_apply_window | 0.000000 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.000000 |
| wsrep_commit_window | 0.000000 |
| wsrep_local_state | 0 |
| wsrep_local_
| wsrep_cert_
| wsrep_causal_reads | 0 |
| wsrep_incoming_
| wsrep_cluster_
| wsrep_cluster_size | 9 |
| wsrep_cluster_
| wsrep_cluster_
| wsrep_connected | ON |
| wsrep_local_
| wsrep_local_index | 6 |
| wsrep_provider_name | Galera |
| wsrep_provider_
| wsrep_provider_
| wsrep_ready | OFF |
+------
40 rows in set (0.01 sec)
After 10 minutes - cluster stil down, all nodes non-primary:
percona1 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
percona2 :
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/
Connection to 127.0.0.1 closed.
percona3 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
percona4 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
percona5 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
percona6 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
percona7 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
percona8 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
percona9 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
percona10 :
wsrep_cluster_
Connection to 127.0.0.1 closed.
I was able to get up the cluster by SET GLOBAL wsrep_provider_
16:30:04 N 615 9 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:30:07 P 39 9 Sync T/T 0 0 0 1 0 643 0.0 0 0 0 0 0 0 0
(...)
16:31:01 P 39 9 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:31:04 P 40 10 Sync T/T 0 0 0 1 0 707 0.0 0 0 0 0 0 0 0
but just a moment later, without any action nor load:
16:31:53 P 40 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:31:56 P 40 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:31:59 P 40 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:02 P 40 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:05 P 40 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:08 P 40 10 Sync T/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:11 N 615 3 Init F/T 0 0 0 2 0 512 0.0 0 0 0 0 0 0 0
16:32:14 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:17 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:20 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:23 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:26 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:29 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:32 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:35 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
LefredPXC / percona6 / Galera 2.8(r165)
Wsrep Cluster Node Queue Ops Bytes Flow Conflct PApply Commit
time P cnf # cmt sta Up Dn Up Dn Up Dn pau snt lcf bfa dst oooe oool wind
16:32:38 N 615 3 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:41 N 615 1 Init F/T 0 0 0 2 0 256 0.0 0 0 0 0 0 0 0
16:32:44 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:47 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:50 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:53 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:32:56 N 615 1 Init F/T 0 0 0 1 0 128 0.0 0 0 0 0 0 0 0
Cluster was down in this state until it self recovered few minutes later:
16:36:36 N 615 1 Init F/T 0 0 0 0 0 0 0.0 0 0 0 0 0 0 0
16:36:39 P 41 10 Sync T/T 0 0 0 1 0 707 0.0 0 0 0 0 0 0 0
In second test I just joined percona1 node with already bad network set. After a while the node joined, the cluster went into non-consistent state, where some nodes got stuck with Primary state and wsrep_cluster_size = 10 while it was not true as 2 nodes went down and other one was in non-prim state. SQL queries were hanging anyway on those nodes who claimed primary state.
Logs for second test example