So, based on Ask's comment about notifications, I started looking into it. As it turns out, *if* you're running a version of kombu/amqp which supports the channel_errors object (version 2.1.4 seems to be when it was introduced: http://kombu.readthedocs.org/en/latest/changelog.html), the following simple patch resolves the issue (also attached):
Basically, in ensure() you want to watch the channel and not the connection.
I verified this in a 2 node rabbit cluster. There are 2 nodes: .139 and .141. .139 is currently the master.
The following is from the nova logs when .139 is stopped (and .141 is promoted to the master):
Notice, we're connected to 192.168.128.141:
013-08-22 21:27:45.807 INFO nova.openstack.common.rpc.common [req-20aa6610-b0df-4730-9773-6024e47a6da7 None None] Connected to AMQP server on 192.168.128.141:5672
2013-08-22 21:27:45.843 INFO nova.openstack.common.rpc.common [req-c82c8ea0-aa8b-49b0-925c-b79399f011de None None] Connected to AMQP server on 192.168.128.141:5672
...
Then, we stop rabbit on .139 and see the following *channel* error:
2013-08-22 21:28:13.475 20003 ERROR nova.openstack.common.rpc.common [-] Failed to consume message from queue: tag u'2'
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common Traceback (most recent call last):
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py", line 572, in ensure
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return method(*args, **kwargs)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/impl_kombu.py", line 654, in _consume
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return self.connection.drain_events(timeout=timeout)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 281, in drain_events
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return self.transport.drain_events(self.connection, **kwargs)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/local/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 91, in drain_events
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return connection.drain_events(**kwargs)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 286, in drain_events
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common return amqp_method(channel, args)
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 1628, in _basic_cancel_notify
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common raise ConsumerCancel('tag %r' % (consumer_tag, ))
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common ConsumerCancel: tag u'2'
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.common.rpc.common
Ensure fails due to the channel error and causes the service to reconnect. It reconnects to the same host (as it is now the only one alive):
2013-08-22 21:28:13.478 20003 INFO nova.openstack.common.rpc.common [-] Reconnecting to AMQP server on 192.168.128.141:5672
2013-08-22 21:28:13.510 20003 INFO nova.openstack.common.rpc.common [-] Connected to AMQP server on 192.168.128.141:5672
2013-08-22 21:28:17.007 INFO nova.openstack.common.rpc.common [req-482627bb-812e-4997-90c0-96fbf3c8de34 None None] Connected to AMQP server on 192.168.128.141:5672
Message processing then continues as per usual.
Pip install --upgrade kombu works (even on Ubuntu 12.04) to upgrade kombu to support this, however the ultimate solution will likely need to be more robust than this patch as we should do our best to support the shipping version in LTS out of the box.
So, based on Ask's comment about notifications, I started looking into it. As it turns out, *if* you're running a version of kombu/amqp which supports the channel_errors object (version 2.1.4 seems to be when it was introduced: http:// kombu.readthedo cs.org/ en/latest/ changelog. html), the following simple patch resolves the issue (also attached):
--- impl_kombu.py.new 2013-08-22 21:52:54.711337602 +0000
self. connection = None
self. connection = kombu.connectio n.BrokerConnect ion(**params)
self. connection_ errors = self.connection .connection_ errors .channel_ errors transport:
self. connection. transport. polling_ interval = 0.0
return method(*args, **kwargs) errors, socket.timeout, IOError), e: n_errors, socket.timeout, IOError), e:
error_callback( e)
+++ impl_kombu.py.orig 2013-08-22 21:52:37.727386558 +0000
@@ -488,7 +488,6 @@
- self.channel_errors = self.connection
if self.memory_
# Kludge to speed up tests.
@@ -562,7 +561,7 @@
while True:
try:
- except (self.channel_
+ except (self.connectio
if error_callback:
except Exception, e:
Basically, in ensure() you want to watch the channel and not the connection.
I verified this in a 2 node rabbit cluster. There are 2 nodes: .139 and .141. .139 is currently the master.
The following is from the nova logs when .139 is stopped (and .141 is promoted to the master):
Notice, we're connected to 192.168.128.141:
013-08-22 21:27:45.807 INFO nova.openstack. common. rpc.common [req-20aa6610- b0df-4730- 9773-6024e47a6d a7 None None] Connected to AMQP server on 192.168. 128.141: 5672 common. rpc.common [req-c82c8ea0- aa8b-49b0- 925c-b79399f011 de None None] Connected to AMQP server on 192.168. 128.141: 5672
2013-08-22 21:27:45.843 INFO nova.openstack.
...
Then, we stop rabbit on .139 and see the following *channel* error:
2013-08-22 21:28:13.475 20003 ERROR nova.openstack. common. rpc.common [-] Failed to consume message from queue: tag u'2' common. rpc.common Traceback (most recent call last): common. rpc.common File "/usr/lib/ python2. 7/dist- packages/ nova/openstack/ common/ rpc/impl_ kombu.py" , line 572, in ensure common. rpc.common return method(*args, **kwargs) common. rpc.common File "/usr/lib/ python2. 7/dist- packages/ nova/openstack/ common/ rpc/impl_ kombu.py" , line 654, in _consume common. rpc.common return self.connection .drain_ events( timeout= timeout) common. rpc.common File "/usr/local/ lib/python2. 7/dist- packages/ kombu/connectio n.py", line 281, in drain_events common. rpc.common return self.transport. drain_events( self.connection , **kwargs) common. rpc.common File "/usr/local/ lib/python2. 7/dist- packages/ kombu/transport /pyamqp. py", line 91, in drain_events common. rpc.common return connection. drain_events( **kwargs) common. rpc.common File "/usr/local/ lib/python2. 7/dist- packages/ amqp/connection .py", line 286, in drain_events common. rpc.common return amqp_method( channel, args) common. rpc.common File "/usr/local/ lib/python2. 7/dist- packages/ amqp/channel. py", line 1628, in _basic_ cancel_ notify common. rpc.common raise ConsumerCancel('tag %r' % (consumer_tag, )) common. rpc.common ConsumerCancel: tag u'2' common. rpc.common
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
2013-08-22 21:28:13.475 20003 TRACE nova.openstack.
Ensure fails due to the channel error and causes the service to reconnect. It reconnects to the same host (as it is now the only one alive):
2013-08-22 21:28:13.478 20003 INFO nova.openstack. common. rpc.common [-] Reconnecting to AMQP server on 192.168. 128.141: 5672 common. rpc.common [-] Connected to AMQP server on 192.168. 128.141: 5672 common. rpc.common [req-482627bb- 812e-4997- 90c0-96fbf3c8de 34 None None] Connected to AMQP server on 192.168. 128.141: 5672
2013-08-22 21:28:13.510 20003 INFO nova.openstack.
2013-08-22 21:28:17.007 INFO nova.openstack.
Message processing then continues as per usual.
Pip install --upgrade kombu works (even on Ubuntu 12.04) to upgrade kombu to support this, however the ultimate solution will likely need to be more robust than this patch as we should do our best to support the shipping version in LTS out of the box.