Steps to reproduce:
1. Deploy cluster with the following parameters:
3 controllers+mongo, KVM, 5 GB RAM
1 compute+ceph, Supermicro, 16 GB RAM
Sahara, Ceilometer enabled, Ceph for volumes, Ceph for images, Ceph for ephemeral volumes
2. Disable rabbitmq:
pcs resource disable master_p_rabbitmq-server
wait while master and slaves was stopped
3. Enable rabbitmq:
pcs resource enable master_p_rabbitmq-server
wait while master and slaves was started
Expected result:
Ceilometer collector successfully reconnected to rabbitmq
Actual result:
On all controller nodes in /var/log/ceilometer/ceilometer-collector.log we can see the following errors:
2015-10-28 11:43:01.113 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:02.115 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 192.168.0.3:5673
2015-10-28 11:43:02.123 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.0.3:5673 is unreachable: timed out. Trying again in 30 seconds.
2015-10-28 11:43:32.154 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds ...
2015-10-28 11:43:33.155 14829 INFO oslo.messaging._drivers.impl_rabbit [-] Connecting to AMQP server on 127.0.0.1:5673
2015-10-28 11:43:33.170 14829 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server 192.168.0.3:5673 closed the connection. Check login credentials: Socket closed
Metering queue in rabbit is not empty:
root@node-1:~# rabbitmqctl list_queues | grep metering
metering.sample 224
q-metering-plugin 0
q-metering-plugin.node-1 0
q-metering-plugin.node-2 0
q-metering-plugin.node-3 0
q-metering-plugin_fanout_1feb6540cda34d758611354495b98bfb 0
q-metering-plugin_fanout_64fb6a3997c44ecea91bbf617a8920d5 0
q-metering-plugin_fanout_bae52586cac2478b960d851b690f1494 0
After ~10 minutes collector on one controller reconnects to rabbitmq, but collectors on other two controllers don't.
For example, all ceilometer-agent-notifications succesfully reconnect to rabbitmq after rabbit restart.
Workaround: restart ceilometer-collector, after this collector succesfully connects to rabbitmq.
VERSION: 26_14-25- 46" a73030b8100cf8c 5ca6a970a91" af819d402c645c7 31af450aff0" 570460c29d6c329 3219d3624d4" ff522ca9f6d34e7 e135f150a90" a8b4bee79ca45a5 4b76c1361b8"
feature_groups:
- mirantis
production: "docker"
release: "6.0"
api: "1.0"
build_number: "58"
build_id: "2014-12-
astute_sha: "16b252d93be6aa
fuellib_sha: "fde8ba5e11a1ac
ostf_sha: "a9afb68710d809
nailgun_sha: "5f91157daa6798
fuelmain_sha: "81d38d6f2903b5
Steps to reproduce:
1. Deploy cluster with the following parameters:
3 controllers+mongo, KVM, 5 GB RAM
1 compute+ceph, Supermicro, 16 GB RAM
Sahara, Ceilometer enabled, Ceph for volumes, Ceph for images, Ceph for ephemeral volumes
2. Disable rabbitmq: p_rabbitmq- server
pcs resource disable master_
wait while master and slaves was stopped
3. Enable rabbitmq: p_rabbitmq- server
pcs resource enable master_
wait while master and slaves was started
Expected result:
Ceilometer collector successfully reconnected to rabbitmq
Actual result: ceilometer/ ceilometer- collector. log we can see the following errors:
On all controller nodes in /var/log/
2015-10-28 11:43:01.113 14829 INFO oslo.messaging. _drivers. impl_rabbit [-] Delaying reconnect for 1.0 seconds ... _drivers. impl_rabbit [-] Connecting to AMQP server on 192.168.0.3:5673 _drivers. impl_rabbit [-] AMQP server on 192.168.0.3:5673 is unreachable: timed out. Trying again in 30 seconds. _drivers. impl_rabbit [-] Delaying reconnect for 1.0 seconds ... _drivers. impl_rabbit [-] Connecting to AMQP server on 127.0.0.1:5673 _drivers. impl_rabbit [-] AMQP server 192.168.0.3:5673 closed the connection. Check login credentials: Socket closed
2015-10-28 11:43:02.115 14829 INFO oslo.messaging.
2015-10-28 11:43:02.123 14829 ERROR oslo.messaging.
2015-10-28 11:43:32.154 14829 INFO oslo.messaging.
2015-10-28 11:43:33.155 14829 INFO oslo.messaging.
2015-10-28 11:43:33.170 14829 ERROR oslo.messaging.
Metering queue in rabbit is not empty: plugin. node-1 0 plugin. node-2 0 plugin. node-3 0 plugin_ fanout_ 1feb6540cda34d7 58611354495b98b fb 0 plugin_ fanout_ 64fb6a3997c44ec ea91bbf617a8920 d5 0 plugin_ fanout_ bae52586cac2478 b960d851b690f14 94 0
root@node-1:~# rabbitmqctl list_queues | grep metering
metering.sample 224
q-metering-plugin 0
q-metering-
q-metering-
q-metering-
q-metering-
q-metering-
q-metering-
After ~10 minutes collector on one controller reconnects to rabbitmq, but collectors on other two controllers don't. agent-notificat ions succesfully reconnect to rabbitmq after rabbit restart.
For example, all ceilometer-
Workaround: restart ceilometer- collector, after this collector succesfully connects to rabbitmq.