Number of Rabbitmq queues is growing from failover to failover
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Alexey Khivin | ||
6.0.x |
Won't Fix
|
High
|
Alexey Khivin | ||
6.1.x |
Fix Committed
|
High
|
Alexey Khivin |
Bug Description
During the Cloud operation stage, some of controllers may go down and return back - as a planned downtime or as a sudden failure (and that could happen even more often in a case of a network flapping). This is OK from the HA cluster health perspective. But
the number of Rabbitmq queues grows every time Openstack services reconnect to the AMQP server. In a long time that could end up running out of any connection limits or a major performance issues.
For example, in the files attached you can see how initial amount of ~100 queues grew up to the 1500 after a 100 failovers.
Steps to reproduce:
1. Deploy any HA environment with at least 3 controllers and collect info about queues from any controller:
"rabbitmqctl list_queues pid name arguments | column -t"
2. Shutdown any controller non gracefully (or kill with -9 its beam process)
3. Wait until OSTF HA health check will pass w/o failures
4. Bring the 'failed' controller up again (or skip steps 4-5, if you have killed a beam process instead)
5. Wait until OSTF HA health check will pass w/o failures
6. Collect info about queues from any controller again and compare it with the previous results.
7. Wait for not less than 60 minutes and repeat steps 2-6.
As a result you will see that some of the queues created more than 1h ago with 'x-expires' set to 1h will persist.
That happens likely due to Openstack services do not release an unused or obsolete or temporary (uuid based) queues after a reconnect. Looks like this issue cannot be addressed by Fuel but should be addressed by Oslo.messaging instead.
{"build_id": "2014-11-
Changed in fuel: | |
status: | New → Confirmed |
tags: | added: release-notes |
tags: |
added: release-notes-done removed: release-notes |
Logs snapshot https:/ /drive. google. com/file/ d/0B2t6uNOmX_ phYmpoa1BzMUJ3U GM/view? usp=sharing