First reported via https://bugzilla.redhat.com/show_bug.cgi?id=1585032
...
=INFO REPORT==== 31-May-2018::17:32:39 ===
closing AMQP connection <0.6475.0> (172.17.0.19:46030 -> 172.17.0.19:5672 - cinder-volume:118015:a9c3e789-694a-42b8-9104-6f11d93e2d0e)
This is attempting to declare the queue master for the cinder-volume.hostgroup@tripleo_iscsi.hostgroup queue onto overcloud-controller-1. The reason for this is because we have set the queue_master_locator option in rabbitmq.config:
{queue_master_locator, <<"min-masters">>},
So controller-0 decides that controller-1 has the fewest number of master queues and tries to declare it there.
However, at the time this is happening, controller-1 is restarting. Note the error is at 17:32:39, and then compare to the rabbit log on controller-1:
=INFO REPORT==== 31-May-2018::17:32:36 ===
Stopped RabbitMQ application
=INFO REPORT==== 31-May-2018::17:32:38 ===
Clustering with ['rabbit@overcloud-controller-0'] as disc node
=INFO REPORT==== 31-May-2018::17:32:41 ===
Starting RabbitMQ 3.6.5 on Erlang 18.3.4.7
So at 39 seconds, controller-1 has rejoined the cluster but has not yet started the rabbit app.
This is probably a bug somewhere in the master locator code. I would expect it to verify the target node is actually up, not just clustered.
...
It seems more often than not rabbitmq has issues with min-master as it locates a node that is not fully up and hence fails afterwards.
https:/ /review. openstack. org/#/c/ 587064/