Fuel for OpenStack

Wrong post-start notify exit code in RabbitMQ OCF causing additional resource failures in Pacemaker

Bug #1438699 reported by Bogdan Dobrelya on 2015-03-31

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Committed	High	Bogdan Dobrelya	Fuel for OpenStack 6.1
5.0.x	Invalid	High	Unassigned	Fuel for OpenStack 5.0-updates
5.1.x	Won't Fix	High	Denis Meltsaykin	Fuel for OpenStack 5.1.1-updates
6.0.x	Won't Fix	High	Denis Meltsaykin	Fuel for OpenStack 6.0-updates
6.1.x	Fix Committed	High	Bogdan Dobrelya	Fuel for OpenStack 6.1

Bug Description

The post-start notify event is sent by Pacemaker for all instances of the multistate RabbitMQ clone resource every time a rabbit node starts somewhere in the cluster. And there is an error in OCF logic causing the resource to be reported as $OCF_NOT_RUNNING that leads to additional restarts for rabbitmq resources in pacemaker.

The message "Failed to join the cluster on post-start. Resource is failed" indicates this issue and it should not be reported by the nodes then processing the post-start notify as well as the exit code for this event should be $OCF_SUCCESS for the nodes not joining the cluster. Only the node which actually has started and generated this notify is joining the cluster and may fail with this error message.

Bogdan Dobrelya (bogdando) on 2015-03-31

Changed in fuel:
milestone:	none → 6.1
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-31: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/169320

Changed in fuel:
status:	New → In Progress

OpenStack Infra (hudson-openstack) on 2015-04-01

Changed in fuel:
assignee:	Bogdan Dobrelya (bogdando) → Bartlomiej Piotrowski (bpiotrowski)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-01: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/169320
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=dd34e200da582afd996c276530bb761ebd59dbb0
Submitter: Jenkins
Branch: master

commit dd34e200da582afd996c276530bb761ebd59dbb0
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Mar 31 16:00:47 2015 +0200

Fix post-start notify exit code for rabbit OCF

    There is an error in OCF logic causing the resource to be reported
    as $OCF_NOT_RUNNING that leads to additional restarts for rabbitmq
    resources in pacemaker.

    The solution is to ensure that only the node which actually has
    started and is joining the cluster may fail with this error code,
    while the other nodes may not.

Closes-bug: #1438699

Change-Id: I8d3b6e8f76a6a89608e59a52081c21931e5654fb
Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Denis Meltsaykin (dmeltsaykin) wrote on 2015-10-26:

Setting this as Won't Fix for 5.1.1-updates and 6.0-updates, as such a complex change cannot be delivered in the scope of the Maintenance Update. Also, the possible solution of the backporting of RabbitMQ OCF script is covered in details by the Operations Guide from the official documentation of the Product.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.