MySQL OCF RA: when a node is down, it should prefer the wsrep-recover output to the grastate.dat file content

Bug #1574497 reported by Bogdan Dobrelya
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Bogdan Dobrelya
6.1.x
Triaged
High
MOS Maintenance
7.0.x
Triaged
High
MOS Maintenance
8.0.x
Triaged
High
MOS Maintenance
Mitaka
Fix Released
High
Bogdan Dobrelya
Newton
Fix Committed
High
Bogdan Dobrelya

Bug Description

When a galera node is down, the most relevant GTID shall be acquired from the output of the command "/usr/bin/mysqld_safe --wsrep-recover" rather than from the /var/lib/mysql/grastate.dat file's content. Otherwise, choosing the seed node which other nodes must join may be in-precise.

Steps to reproduce were given in the Galera reliability testing https://goo.gl/VHyIIE paper. Briefly: deploy a 5 nodes galera cluster, run the given Jepsen cases to verify its self-heal capabilities.

Example, actual GTIDs are:
Online: [ n1 n2 n3 n4 n5 ]

 Clone Set: p_mysql-clone [p_mysql]
     Started: [ n1 n2 ]
     Stopped: [ n3 n4 n5 ]

Node Attributes:
* Node n1:
    + gtid : dc7a6c0c-0889-11e6-8326-478c77479e3b:22693
* Node n2:
    + gtid : dc7a6c0c-0889-11e6-8326-478c77479e3b:22693
* Node n3:
    + gtid : dc7a6c0c-0889-11e6-8326-478c77479e3b:-1
* Node n4:
    + gtid : dc7a6c0c-0889-11e6-8326-478c77479e3b:-1
* Node n5:
    + gtid : dc7a6c0c-0889-11e6-8326-478c77479e3b:-1

Expected for n3,n4,n5:
root@n3:/# ssh n3 /usr/bin/mysqld_safe --wsrep-recover
160425 08:37:26 mysqld_safe WSREP: Recovered position dc7a6c0c-0889-11e6-8326-478c77479e3b:23121
root@n3:/# ssh n4 /usr/bin/mysqld_safe --wsrep-recover
160425 08:37:43 mysqld_safe WSREP: Recovered position dc7a6c0c-0889-11e6-8326-478c77479e3b:23785
root@n3:/# ssh n5 /usr/bin/mysqld_safe --wsrep-recover
160425 08:37:50 mysqld_safe WSREP: Recovered position dc7a6c0c-0889-11e6-8326-478c77479e3b:23785

Changed in fuel:
importance: Undecided → High
milestone: none → 10.0
tags: added: area-library galera
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/309891

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/309891

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/310111

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/310111
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=2c99cb0188ee4762d34bf8421c91ea0764cec389
Submitter: Jenkins
Branch: master

commit 2c99cb0188ee4762d34bf8421c91ea0764cec389
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Apr 26 09:00:54 2016 +0200

    Prefer the wsrep-recover to get the GTID

    And fallback to the grastate.dat only if an empty value.

    Closes-bug: #1574497

    Change-Id: I582adcfc71757c40619bffbbb398d4a3635b333e
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/312416

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/mitaka)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/312416
Reason: to early to backport, let's fix all of the issues discovered in the master first

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/312416
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=2db1c98ba594e0bb0891c077f8fa9fb4ac1f91e1
Submitter: Jenkins
Branch: stable/mitaka

commit 2db1c98ba594e0bb0891c077f8fa9fb4ac1f91e1
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Apr 26 09:00:54 2016 +0200

    Prefer the wsrep-recover to get the GTID

    And fallback to the grastate.dat only if an empty value.

    Fuel-CI: disable

    Closes-bug: #1574497

    Change-Id: I582adcfc71757c40619bffbbb398d4a3635b333e
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 2c99cb0188ee4762d34bf8421c91ea0764cec389)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/315989

tags: added: tech-debt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/316802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/317978

tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :
Download full text (3.2 KiB)

Verified on 9.0 ISO #465.

Steps to verify:
1. Create an HA environment
2. Kill mysqld process on controller
3. Check logs mysqld_safe.log and ocf-mysql-wss.log (these logs are located in master node in /var/log/remote/node-*.test.domain.local/ folder)

Actual results:
As we can see in logs, when a galera node is down, it prefers the wsrep-recover output to the grastate.dat file content.
===mysqld_safe.log:===
2016-06-17T11:43:52.848814+00:00 notice: mysqld from pid file /var/run/resource-agents/mysql-wss/mysql-wss.pid ended
2016-06-17T11:44:47.863552+00:00 notice: Starting mysqld daemon with databases from /var/lib/mysql
2016-06-17T11:44:47.875919+00:00 notice: WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.bIUXl1' --pid-file='/var/lib/mysql/node-29.test.domain.local-recover.pid'
2016-06-17T11:44:50.014964+00:00 notice: WSREP: Recovered position 9c599751-3397-11e6-9a85-9a742d73be17:15780

===ocf-mysql-wss.log:===
2016-06-17T11:44:42.089812+00:00 err: ERROR: p_mysqld: mysql_status(): MySQL is not running
2016-06-17T11:44:47.190665+00:00 info: INFO: p_mysqld: get_node_gtid(): No GTID for node-29.test.domain.local
2016-06-17T11:44:47.199153+00:00 info: INFO: p_mysqld: validate_gtid(): GTID OK: 9c599751-3397-11e6-9a85-9a742d73be17:15780
2016-06-17T11:44:47.203796+00:00 info: INFO: p_mysqld: update_node_gtid(): Galera GTID: 9c599751-3397-11e6-9a85-9a742d73be17:15780
2016-06-17T11:44:47.315278+00:00 info: INFO: p_mysqld: mysql_start(): Starting MySQL
2016-06-17T11:44:47.336544+00:00 info: INFO: p_mysqld: check_if_sst(): No signs of SST found
2016-06-17T11:44:47.341930+00:00 info: INFO: p_mysqld: mysql_status(): PIDFile /var/run/resource-agents/mysql-wss/mysql-wss.pid of MySQL server not found. Sleeping for 2 seconds. 0 retries left
2016-06-17T11:44:49.351335+00:00 info: INFO: p_mysqld: mysql_status(): MySQL is not running
2016-06-17T11:44:52.377564+00:00 info: INFO: p_mysqld: check_if_sst(): MySQL process 26148 found
2016-06-17T11:44:52.427604+00:00 info: INFO: p_mysqld: check_if_sst(): SST is in progress
2016-06-17T11:44:52.432879+00:00 info: INFO: p_mysqld: mysql_start(): MySQL started

[root@nailgun remote]# shotgun2 short-report
cat /etc/fuel_build_id:
 465
cat /etc/fuel_build_number:
 465
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 fuel-misc-9.0.0-1.mos8454.noarch
 python-packetary-9.0.0-1.mos140.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 fuel-migrate-9.0.0-1.mos8454.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-notify-9.0.0-1.mos8454.noarch
 nailgun-mcagents-9.0.0-1.mos750.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-utils-9.0.0-1.mos8454.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8742.noarch
 fuel-library9.0-9.0.0-1.mos8454.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 fuel-ostf-9.0.0-1.mos935.noarch
 fuelmenu-9.0.0-1.mos274.noarch
 fuel-nailgun-9.0.0-1.mos8742.noarch
 rubygem-astute-9.0.0-1.mos750.noarch
 fuel-m...

Read more...

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/317978

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/316802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.1)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/315989

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/374219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/374219
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f9a2d479f3687157d2b17a927a09ce5f995522d6
Submitter: Jenkins
Branch: stable/7.0

commit f9a2d479f3687157d2b17a927a09ce5f995522d6
Author: Denis Puchkin <email address hidden>
Date: Wed Sep 21 17:38:54 2016 +0300

    Backport mysql OCF from stable/mitaka

    backport mysql ocf script from stable/mitaka

    Closes-bug: #1524826
    Closes-bug: #1542256
    Closes-bug: #1572239
    Closes-bug: #1572557
    Closes-bug: #1572601
    Closes-bug: #1574747
    Closes-bug: #1574497
    Closes-bug: #1576244
    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779
    Closes-bug: #1574999
    Closes-bug: #1576244
    Closes-bug: #1583173
    Closes-bug: #1585125

    Change-Id: I1cc6f95884a8fbd5c3418ede89bdf9ec6864bdc8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/377597

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/377597
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=b3873f5f5a0bb1526b1269f163223ae48d6e21f5
Submitter: Jenkins
Branch: stable/8.0

commit b3873f5f5a0bb1526b1269f163223ae48d6e21f5
Author: Denis Puchkin <email address hidden>
Date: Tue Sep 27 13:20:25 2016 +0300

    Backport mysql OCF from stable/mitaka

    backport mysql ocf script from stable/mitaka

    Closes-bug: #1524826
    Closes-bug: #1542256
    Closes-bug: #1572239
    Closes-bug: #1572557
    Closes-bug: #1572601
    Closes-bug: #1574747
    Closes-bug: #1574497
    Closes-bug: #1576244
    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779
    Closes-bug: #1574999
    Closes-bug: #1576244
    Closes-bug: #1583173
    Closes-bug: #1585125

    Change-Id: I1cc6f95884a8fbd5c3418ede89bdf9ec6864bdc8

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.