A recent update to oslo.messaging to resolve #1789177 causes failures.
(Below comments copied form the original bug):
After a partial upgrade (only one side, producers or consumers), there are a lot of MessageTimeout and DuplicateMessage errors in the logs. Downgrading back to 5.35.0-0ubuntu1~cloud0 fixed the problem.
Right after restarted n-ovs-agent, I can see a lot of errors in rabbitmq log[1]
which is the same as the error when rabbitmq failover issue ( the original issue of this LP )
Then after I upgraded oslo.messaging in neutron-api unit and restarted neutron-server, below errors are gone and I was able to create instance again.
After upgrading oslo.messaging in n-ovs only, exchange they communicate didn't match.
As changing exchanges they use depends on publisher-cosumer relation.
So I think there are two ways.
1. revert this patch for Q ( original failover problem will be there )
2. upgrade them with maintenance window
Thanks a lot
[1]
################################################################################
=ERROR REPORT==== 3-Feb-2021::03:25:26 ===
Channel error on connection <0.2379.1> (10.0.0.32:60430 -> 10.0.0.34:5672, vhost: 'openstack', user: 'neutron'), channel 1:
{amqp_error,not_found,
"no exchange 'reply_7da3cecc31b34bdeb96c866dc84e3044' in vhost 'openstack'", 'basic.publish'}
A recent update to oslo.messaging to resolve #1789177 causes failures.
(Below comments copied form the original bug):
After a partial upgrade (only one side, producers or consumers), there are a lot of MessageTimeout and DuplicateMessage errors in the logs. Downgrading back to 5.35.0- 0ubuntu1~ cloud0 fixed the problem.
Right after restarted n-ovs-agent, I can see a lot of errors in rabbitmq log[1]
which is the same as the error when rabbitmq failover issue ( the original issue of this LP )
Then after I upgraded oslo.messaging in neutron-api unit and restarted neutron-server, below errors are gone and I was able to create instance again.
After upgrading oslo.messaging in n-ovs only, exchange they communicate didn't match.
As changing exchanges they use depends on publisher-cosumer relation.
So I think there are two ways.
1. revert this patch for Q ( original failover problem will be there )
2. upgrade them with maintenance window
Thanks a lot
[1] ####### ####### ####### ####### ####### ####### ####### ####### ####### ####### ### :03:25: 26 === not_found, 7da3cecc31b34bd eb96c866dc84e30 44' in vhost 'openstack'",
'basic. publish' }
#######
=ERROR REPORT==== 3-Feb-2021:
Channel error on connection <0.2379.1> (10.0.0.32:60430 -> 10.0.0.34:5672, vhost: 'openstack', user: 'neutron'), channel 1:
{amqp_error,
"no exchange 'reply_
10.0.0.32 is neutron-api unit