Wrong post-start notify exit code in RabbitMQ OCF causing additional resource failures in Pacemaker
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Bogdan Dobrelya | ||
5.0.x |
Invalid
|
High
|
Unassigned | ||
5.1.x |
Won't Fix
|
High
|
Denis Meltsaykin | ||
6.0.x |
Won't Fix
|
High
|
Denis Meltsaykin | ||
6.1.x |
Fix Committed
|
High
|
Bogdan Dobrelya |
Bug Description
The post-start notify event is sent by Pacemaker for all instances of the multistate RabbitMQ clone resource every time a rabbit node starts somewhere in the cluster. And there is an error in OCF logic causing the resource to be reported as $OCF_NOT_RUNNING that leads to additional restarts for rabbitmq resources in pacemaker.
The message "Failed to join the cluster on post-start. Resource is failed" indicates this issue and it should not be reported by the nodes then processing the post-start notify as well as the exit code for this event should be $OCF_SUCCESS for the nodes not joining the cluster. Only the node which actually has started and generated this notify is joining the cluster and may fail with this error message.
Changed in fuel: | |
milestone: | none → 6.1 |
importance: | Undecided → High |
Changed in fuel: | |
assignee: | Bogdan Dobrelya (bogdando) → Bartlomiej Piotrowski (bpiotrowski) |
Fix proposed to branch: master /review. openstack. org/169320
Review: https:/