nova-compute SSL connections make rabbitmq pods OOM
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Undecided
|
Unassigned | ||
RabbitMQ |
New
|
Undecided
|
Unassigned | ||
oslo.messaging |
New
|
Undecided
|
Unassigned |
Bug Description
we have an Rocky openstack deployment that includes 3 controller and 500 computes.just at 15:58,nova-compute detect that rabbitmq connection was broken ,then reconnected.
2021-07-05 15:58:28.633 8 ERROR oslo.messaging.
2021-07-05 15:58:29.656 8 INFO oslo.messaging.
then rabbitmq report huge connections was closed by client.
=WARNING REPORT==== 5-Jul-2021:
closing AMQP connection <0.6345.754> (20.16.36.44:2451 -> 145.247.103.14:5671 - nova-compute:
client unexpectedly closed TCP connection
after 10 minutes ,cluster was blocked with 0.4 memory watermark.
=INFO REPORT==== 5-Jul-2021:
vm_memory_
*******
*** Publishers will be blocked until this alarm clears ***
*******
However ,after the publishers were bloked ,rabbitmq pod still result in memory leaking,in the end, the node OOM,system force pod to restart.
amqp release : 2.5.2
oslo-messaging release :8.1.4
openstack : Rocky
Changed in oslo.messaging: | |
assignee: | nobody → peiran wei (james940928) |
assignee: | peiran wei (james940928) → nobody |
Changed in nova: | |
status: | New → Invalid |
We saw that this topic was related to "Upgrading to pike version causes rabbit timeouts with ssl",however,we noticed this issue and upgraded amqp and oslo-messaging to 2.5.2\8.1.4,at the end ,bugs still existd.