MySQL OCF RA action monitor must check if a seed node is running the most recent of known GTIDs

Bug #1583173 reported by Bogdan Dobrelya
68
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Bogdan Dobrelya
7.0.x
Fix Released
Critical
Denis Puchkin
8.0.x
Fix Released
Critical
Denis Puchkin
Mitaka
Fix Released
Critical
Sergii Golovatiuk

Bug Description

This bug looks not easy to catch up.
I caught it only a couple of times while was running jepsen tests for few days.

Details:
When the seed (aka master) node was started a long time ago, and later the OCF RA reports "MySQL lost quorum or uninitialized" on majority of the rest DB nodes, it ends up with either a *very* long auto-recovery time, or fails to recover at all.

Only the seed node keeps running, even if it has an obsolete GTID, which is not the most recent across the rest of the nodes. This requires a manual recovery of the DB cluster nodes. For example, one may "nuke" all mysqld on the nodes and allow the OCF RA to pick the most recent node. This provides sad UX, although should not be a big deal.

Example snippet (4/5 nodes was affected): http://pastebin.com/Nrih7BT1

To fix that, perhaps monitor must check if the current seed node (aka master) is running with a bad GTID, which is not the most recent across the nodes, and report failure.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
importance: Undecided → Medium
milestone: none → 10.0
tags: added: galera
summary: - MySQL OCF RA action monitor must check if a node is running the most
- recent of known GTIDs
+ MySQL OCF RA action monitor must check if a seed node is running the
+ most recent of known GTIDs
Changed in fuel:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/318162

Changed in fuel:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/318162
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=8093431349441f6b486e45e8aab62e0a8927a8e2
Submitter: Jenkins
Branch: master

commit 8093431349441f6b486e45e8aab62e0a8927a8e2
Author: Bogdan Dobrelya <email address hidden>
Date: Wed May 18 17:01:54 2016 +0200

    Detect a split-brain for Galera OCF RA

    * One and only seed node (the one with the wsrep-new-cluster) shall
      be running, eventually.
    * For action monitor, check if the node is the seed one
      and is running the most recent GTID, or fail

    Closes-bug: #1583173

    Change-Id: Iaa4855d769fe1e0203fcfb9981413273e0e4dda2
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/361943

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

 `pcs resource restart clone_p_mysqld` creates splitbrain

tags: added: area-library
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/361943
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=3478e0d1c8b35191e35a59a0f8cfc3f59030959f
Submitter: Jenkins
Branch: stable/mitaka

commit 3478e0d1c8b35191e35a59a0f8cfc3f59030959f
Author: Bogdan Dobrelya <email address hidden>
Date: Wed May 18 17:01:54 2016 +0200

    Detect a split-brain for Galera OCF RA

    * One and only seed node (the one with the wsrep-new-cluster) shall
      be running, eventually.
    * For action monitor, check if the node is the seed one
      and is running the most recent GTID, or fail

    Closes-bug: #1583173

    Change-Id: Iaa4855d769fe1e0203fcfb9981413273e0e4dda2
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 8093431349441f6b486e45e8aab62e0a8927a8e2)

tags: added: ct2 customer-found sla1 support
tags: added: on-verification
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

Was verified according to #1617400.
Cluster: 3 controllers + 1 compute
Few restarts: pcs resource restart clone_p_mysqld

There is no ERROR messages on mysql logs.

Snapshot #206

tags: removed: on-verification
Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

sla1 for 7.0-updates

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/371626

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/372471

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Mikhail Zhnichkov (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/371626

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/372522

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Mikhail Zhnichkov (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/372471

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/374219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Mikhail Zhnichkov (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/372522
Reason: duplicate https://review.openstack.org/#/c/374219/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/374219
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f9a2d479f3687157d2b17a927a09ce5f995522d6
Submitter: Jenkins
Branch: stable/7.0

commit f9a2d479f3687157d2b17a927a09ce5f995522d6
Author: Denis Puchkin <email address hidden>
Date: Wed Sep 21 17:38:54 2016 +0300

    Backport mysql OCF from stable/mitaka

    backport mysql ocf script from stable/mitaka

    Closes-bug: #1524826
    Closes-bug: #1542256
    Closes-bug: #1572239
    Closes-bug: #1572557
    Closes-bug: #1572601
    Closes-bug: #1574747
    Closes-bug: #1574497
    Closes-bug: #1576244
    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779
    Closes-bug: #1574999
    Closes-bug: #1576244
    Closes-bug: #1583173
    Closes-bug: #1585125

    Change-Id: I1cc6f95884a8fbd5c3418ede89bdf9ec6864bdc8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/377597

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/377597
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=b3873f5f5a0bb1526b1269f163223ae48d6e21f5
Submitter: Jenkins
Branch: stable/8.0

commit b3873f5f5a0bb1526b1269f163223ae48d6e21f5
Author: Denis Puchkin <email address hidden>
Date: Tue Sep 27 13:20:25 2016 +0300

    Backport mysql OCF from stable/mitaka

    backport mysql ocf script from stable/mitaka

    Closes-bug: #1524826
    Closes-bug: #1542256
    Closes-bug: #1572239
    Closes-bug: #1572557
    Closes-bug: #1572601
    Closes-bug: #1574747
    Closes-bug: #1574497
    Closes-bug: #1576244
    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779
    Closes-bug: #1574999
    Closes-bug: #1576244
    Closes-bug: #1583173
    Closes-bug: #1585125

    Change-Id: I1cc6f95884a8fbd5c3418ede89bdf9ec6864bdc8

Revision history for this message
Dmitry (dtsapikov) wrote :

Verified on 7.0+MU6

tags: added: on-verification
tags: removed: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on 8.0 + MU4 updates.

Bug was verified according to #1617400.
Cluster: 3 controllers + 1 compute
Few restarts of resource clone_p_mysql.

There is no ERROR messages on mysql logs.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.