[library] RabbitMQ doesn't assemble after controller reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Committed
|
High
|
Vladimir Kuklin |
Bug Description
{"build_id": "2014-07-
Periodically when rebooting a controller only 2 of 3 controllers rejoin the cluster. After rebooting controller 2 only controllers 1 and 2 join the Rabbit cluster.
[root@node-1 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-1' ...
[{nodes,
{running_
{partitions,[]}]
...done.
[root@node-3 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-3' ...
[{nodes,
...done.
node-3 doesn't join the cluster because the rabbit app is stopped.
[root@node-3 ~]# rabbitmqctl status
Status of node 'rabbit@node-3' ...
[{pid,31333},
{running_
summary: |
- RabbitMQ doesn't assemble after master reboot + RabbitMQ doesn't assemble after controller reboot |
summary: |
- RabbitMQ doesn't assemble after controller reboot + [library] RabbitMQ doesn't assemble after controller reboot |
tags: | added: library rabbitmq |
Changed in fuel: | |
assignee: | nobody → Vladimir Kuklin (vkuklin) |
Changed in fuel: | |
importance: | Undecided → High |
status: | New → Confirmed |
milestone: | none → 5.1 |
Changed in fuel: | |
status: | Confirmed → In Progress |
Reviewed: https:/ /review. openstack. org/109821 /git.openstack. org/cgit/ stackforge/ fuel-library/ commit/ ?id=5b1dc4c0918 24bbc9b51a02f3f 18e0a3bf524dad
Committed: https:/
Submitter: Jenkins
Branch: master
commit 5b1dc4c091824bb c9b51a02f3f18e0 a3bf524dad
Author: Vladimir Kuklin <email address hidden>
Date: Sat Jul 26 23:04:51 2014 +0400
Refactor rabbitmq OCF script
This refactoring adds improvements and fixes several
possible and already filed issues making rabbitmq
cluster reassembling in case of partial or complete
failure.
1) Use mnesia low-level commands instead of
status and cluster_status because these
commands will not block in case of
one of the nodes becoming inaccessible
2) Block access to rabbitmq port while
trying to start it to prevent interference
with client applications
3) Perform test of RMQ server start on promote
4) Do not check if we want to join the cluster -
simply join it - it is idempotent operation
5) Fix my_host() function determining if
our host is included into the list
6) Fix trim_var function to strip the line
instead of stripping the first argument
7) Stop slave node in case we failed
to join the master node. This will make
slave restart again and try to join again
8) Add debug option to monitor command
9) Add debug to several misc. functions
Closes-bug: #1346540
Change-Id: If5df451a6e2d72 bf50c47c28d8a36 b46045dd5cd