RabbitMQ app in some cases may retain stopped w/o being noticed by OCF monitor action
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Vladimir Kuklin | ||
5.1.x |
Won't Fix
|
High
|
Denis Meltsaykin | ||
6.0.x |
Won't Fix
|
High
|
Denis Meltsaykin |
Bug Description
Steps to reproduce:
0) deploy any HA cluster of 3 controllers.
Assume we have a node-1 as a primary controller with the rabbit multistate clone resource master running, and
node-2, node-3 as running multistate resource slaves.
1) Move rabbit master resource to node-2
2) wait for ostf ha passed
3) kill the node-2, which should be a rabbit master now
4) wait for ostf ha passed
5) power on node-2
6) wait for it joined the rabbit cluster (wait for ostf ha passed)
7) repeat 1-6
Expected:
A. The rabbitmq cluster assembles from two remaining nodes, having 1 master and 1 slave after the step #4, no longer than in 5 minutes.
B. The rabbitmq cluster assembles from three nodes, having 1 master and 2 slaves after the steps #2, #6, no longer than in 5 minutes.
C. At least one node is always available for AMQP connections and it's queues and messages synced with other nodes, if any available as well.
Actual: after the step #6, the node-2 will have its rabbit app stopped and will not be shown as running db node in rabbitmcrl report by other nodes and by itself.
Note, sometimes this case can be reproduced after few iterations, sometimes it may take up to 20 or 40 - it looks random.
ISO info:
build_id: 2015-05-25_20-55-26
build_number: '466'
Changed in fuel: | |
assignee: | nobody → Bogdan Dobrelya (bogdando) |
status: | New → In Progress |
description: | updated |
related sys tests update request https:/ /bugs.launchpad .net/fuel/ +bug/1458830