Fuel for OpenStack

Pacemaker mysql resource shall have failcounts configured

Bug #1572440 reported by Bogdan Dobrelya on 2016-04-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Committed	High	Bogdan Dobrelya	Fuel for OpenStack 10.0
	Mitaka	Fix Released	High	Bogdan Dobrelya	Fuel for OpenStack 9.0

Bug Description

W/o failcounts defined, the resource may fail to start and be left stopped.

How to reproduce:
* Deploy a cluster
* Make impossible for the pacemaker mysql clone resource to be started: add a exit 1 to the OCF RA action start.
* Issue crm resource cleanup p_mysql-clone && crm resource restart p_mysql-clone
* Check transitions summary with crm_simulate -Ls | grep -v "\-INF"
* Wait for a while, like 5 min or so, and recheck transitions.

Expected:
Operation start must be always in transitions plan, for example:
Transition Summary:
* Start p_mysql:0 (n1)
* Start p_mysql:1 (n2)
* Start p_mysql:2 (n3)
* Start p_mysql:3 (n4)
* Start p_mysql:4 (n5)

Actual:
It gives up starting the resource

Solution:
Configure failure modes for the pacemaker resource, for example like we do for the rabbit resource:
meta migration-threshold=10 failure-timeout=30s resource-stickiness=100

See original description

Tags:

Bogdan Dobrelya (bogdando) on 2016-04-20

Changed in fuel:
importance:	Undecided → High
tags:	added: galera pacemaker
description:	updated

Bogdan Dobrelya (bogdando) on 2016-04-20

description:	updated
tags:	added: tech-debt
tags:	added: area-library

Dmitry Pyzhov (dpyzhov) on 2016-04-22

no longer affects:

fuel/newton

Revision history for this message

Andrey Maximov (maximov) wrote on 2016-04-26:

how does it affect users ?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2016-05-09:

The UX is manual recovery of a cluster may be needed when DB nodes give up to being start by a Pacemaker. So it impacts DB availability.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-09: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/314031

Changed in fuel:
status:	Triaged → In Progress

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2016-05-10:

How to verify:
MySQL resource instances should be like:

primitive p_mysqld ocf:fuel:mysql-wss \
params config="/etc/mysql/my.cnf" socket="/var/run/mysqld/mysqld.sock" test_conf="/etc/mysql/user.cnf" \
meta failure-timeout=30s migration-threshold=10 resource-stickiness=100 \
... snip ...

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-11: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/314031
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f55a2b98e0db2d5a54c09bbb806d53ef4b3cb794
Submitter: Jenkins
Branch: master

commit f55a2b98e0db2d5a54c09bbb806d53ef4b3cb794
Author: Bogdan Dobrelya <email address hidden>
Date: Mon May 9 11:53:35 2016 +0200

Do not stop DB/MQ by a Pacemaker quorum

Also configure fail modes for the DB resource
the same as for the MQ one.

Closes-bug: #1577689
Closes-bug: #1572440

Change-Id: I3e1ecf67b8bc205920ecd3157b3a422ecd9c6564
Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-18: Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/317934

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-18: Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/317934
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=8c595c20fa155fab95d390939b50495503a0b028
Submitter: Jenkins
Branch: stable/mitaka

commit 8c595c20fa155fab95d390939b50495503a0b028
Author: Bogdan Dobrelya <email address hidden>
Date: Mon May 9 11:53:35 2016 +0200

Do not stop DB/MQ by a Pacemaker quorum

Also configure fail modes for the DB resource
the same as for the MQ one.

Closes-bug: #1577689
Closes-bug: #1572440

    Change-Id: I3e1ecf67b8bc205920ecd3157b3a422ecd9c6564
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit f55a2b98e0db2d5a54c09bbb806d53ef4b3cb794)

Revision history for this message

Maksym Strukov (unbelll) wrote on 2016-06-16:

1. Deploy HA cluster
2. Edit /usr/lib/ocf/resource.d/fuel/mysql-wss add to start section `exit 1`
3. Run `crm resource cleanup clone_p_mysqld && crm resource restart clone_p_mysqld`
4. Check transition summary with `crm_simulate -Ls | grep -v "\-INF"`
You will get smth like:
Transition Summary:
* Recover p_mysqld:0 (Started node-1.test.domain.local)
* Start p_mysqld:1 (node-3.test.domain.local)
* Start p_mysqld:2 (node-2.test.domain.local)

5. Repeat last step in a few minute, periodically should appear
* Start p_mysqld:2 (node-1.test.domain.local)

Verified as fixed in 9.0-mos-485

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.