MySQL OCF RA may not always recover all of the cluster members
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Confirmed
|
High
|
Bogdan Dobrelya | ||
Mitaka |
Confirmed
|
High
|
Bogdan Dobrelya |
Bug Description
There is a rare corner case when some of the DB cluster members refuse to join with an error like:
[ERROR] WSREP: Local state seqno (14127) is greater than group seqno (14126): states diverged. Aborting to avoid potential data loss. Remove '/var/lib/
We have to decide how to deal with the such cases when the most seen GTID has not the latest SEQNO, while the minority nodes with another GTID(s) may have the most recent SEQNO
So we have to either:
* change how we evaluate the master (use just max(SEQNO) and ignore the most seen GTIDs)
* remove grastate.dat from OCF RA as recommended (DATA LOSS risks, bad idea) - no way to go, the OCF RA may end up removing 3/5 and data lost.
* allow resources to be recovered by an admin's decision and touch nothing (leave nodes stopped - no fully-automated recovery) - no fix in fact, leave as is and document as known issue, requires manual recovery steps.
The latter one seems the only doable option
Example output of the crm_mon -fotAW -1 command:
Online: [ n1 n2 n3 n4 n5 ]
Clone Set: p_mysql-clone [p_mysql]
Started: [ n1 n2 ]
Stopped: [ n3 n4 n5 ]
Node Attributes:
* Node n1:
+ gtid : dc7a6c0c-
* Node n2:
+ gtid : dc7a6c0c-
* Node n3:
+ gtid : dc7a6c0c-
* Node n4:
+ gtid : dc7a6c0c-
* Node n5:
+ gtid : dc7a6c0c-
As you can see, 2/5 nodes have 22692, 1/5 has a greater 23121, and 2/5 has
23785. Note, the n5's GTID value stored in CIB is not actual, the real one can be seen as:
ssh n5 /usr/bin/
160425 07:46:36 mysqld_safe WSREP: Recovered position dc7a6c0c-
So how to recover that, that is the question.
Changed in mos: | |
importance: | Undecided → Medium |
milestone: | none → 10.0 |
assignee: | nobody → Fuel Library Team (fuel-library) |
no longer affects: | mos |
tags: | added: area-library galera |
Changed in fuel: | |
importance: | Undecided → Medium |
milestone: | none → 10.0 |
assignee: | nobody → Fuel Library Team (fuel-library) |
summary: |
- MySQL OCF RA may not always recover not all of the cluster members + MySQL OCF RA may not always recover all of the cluster members |
Changed in fuel: | |
status: | New → Confirmed |
tags: |
added: area-docs docs removed: area-library |
description: | updated |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Fuel Documentation Team (fuel-docs) |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando) |
Changed in fuel: | |
status: | Triaged → In Progress |
no longer affects: | fuel/newton |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Fuel Sustaining (fuel-sustaining-team) |
Changed in fuel: | |
assignee: | Fuel Sustaining (fuel-sustaining-team) → Bogdan Dobrelya (bogdando) |
Raising to high due to UX impact to a DB cluster recovery