mysql-wss can potentially stop mysqld during SST
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Bogdan Dobrelya | ||
6.1.x |
Won't Fix
|
High
|
MOS Maintenance | ||
7.0.x |
Won't Fix
|
High
|
MOS Maintenance | ||
8.0.x |
Won't Fix
|
High
|
MOS Maintenance | ||
Mitaka |
Fix Released
|
High
|
Bogdan Dobrelya | ||
Newton |
Fix Committed
|
High
|
Bogdan Dobrelya |
Bug Description
During the deployment of galera cluster, p_mysql pacemaker resource starts before the actual DB configuration. mysql_status() and mysql_monitor() interact in such a way, that it is possible to have a situation, when mysql_monitor() considers a MySQL server unresponsive, when really it is in process of syncing with master server (SST).
We are occasionally experiencing this in a CI environment.
There are 3 servers in MySQL/Galera cluster. The sequence of events (as seen in logs):
- mysql 1 (primary) has been deployed
- mysql 2 deploying
- mysql 2 syncing with 1
- OCF script on 2 tries to mysql_monitor(), which actually tries to connect to the DB and execute certain queries. It obviously fails, since Puppet actually waits for sync to finish to configure "clustercheck" user.
- failure count for p_mysql reaches a threshold
- mysql 2 in JOINED state
- pacemaker/corosync on 2 issues a restart command for p_mysql
- mysql 3 deploying
- mysql 3 syncing with 2
- mysql 2 stops (due to a stop command for p_mysql)
- mysql 3 crashes
- mysql 2 starts
The galera cluster is not able to recover from this, which leads to a failed deployment.
Possible solution
mysql_status() has a check if "/var/lib/
https:/
mysql_monitor() calls mysql_status(), but it doesn't distinguish if SST in progress.
https:/
So the solution could be to move the SST check to mysql_monitor() so that it doesn't try to connect to the mysql server if SST is in progress
@Dmitry, can you please put more data:
1) Which version of fuel is affected?
2) Can you please collect diagnostic snapshot? It would help a lot to troubleshoot the issue.