Octavia loadbalancer are staying in OFFLINE

Bug #2032833 reported by Quentin GROLLEAU
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
New
Undecided
Unassigned

Bug Description

We have discovered that after some times the health-manager is not updating the operating-status of the loadbalancer.

We are running Octavia in Zed with Python 3.10 and a venv with those python library :
https://paste.opendev.org/show/bqGsJUSKaJeKxp6evvoS/

After a while, we won't see any logs like :
2023-08-22 17:02:40.672 6 DEBUG octavia.amphorae.drivers.health.heartbeat_udp [-] Received packet from ('X.X.X.X', 61403) dorecv /var/lib/openstack/lib/python3.10/site-packages/octavia/amphorae/drivers/health/heartbeat_udp.py:95

and when we do a GMR on the process 6 we have :
https://paste.opendev.org/show/bty8vb4FxzAHqA6wDXH7/

/var/lib/openstack/lib/python3.10/site-packages/octavia/amphorae/drivers/health/heartbeat_udp.py:95 in dorecv
    `LOG.debug('Received packet from %s', srcaddr)`

/usr/lib/python3.10/logging/__init__.py:1835 in debug
    `self.log(DEBUG, msg, *args, **kwargs)`

/usr/lib/python3.10/logging/__init__.py:1879 in log
    `self.logger.log(level, msg, *args, **kwargs)`

/usr/lib/python3.10/logging/__init__.py:1547 in log
    `self._log(level, msg, args, **kwargs)`

/usr/lib/python3.10/logging/__init__.py:1624 in _log
    `self.handle(record)`

/usr/lib/python3.10/logging/__init__.py:1634 in handle
    `self.callHandlers(record)`

/usr/lib/python3.10/logging/__init__.py:1696 in callHandlers
    `hdlr.handle(record)`

/usr/lib/python3.10/logging/__init__.py:966 in handle
    `self.acquire()`

/usr/lib/python3.10/logging/__init__.py:917 in acquire
    `self.lock.acquire()`

/var/lib/openstack/lib/python3.10/site-packages/oslo_log/pipe_mutex.py:95 in acquire
    `eventlet.hubs.trampoline(self.rfd, read=True)`

/var/lib/openstack/lib/python3.10/site-packages/eventlet/hubs/__init__.py:159 in trampoline
    `return hub.switch()`

/var/lib/openstack/lib/python3.10/site-packages/eventlet/hubs/hub.py:313 in switch
    `return self.greenlet.switch()`

But i don't think the health-manager is using eventlet
([O345] Usage of Python eventlet module not allowed --> https://github.com/openstack/octavia/blob/stable/zed/HACKING.rst)

And we are using oslo.log in 5.0.2 which include this patch : https://github.com/openstack/oslo.log/commit/94b9dc32ec1f52a582adbd97fe2847f7c87d6c17

The conf we use :
https://paste.opendev.org/show/b6n2SRVG8VvAvNsKX3Rd/

If you have any idea don't hesitate

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Hi,

it seems that olso.log 5.0.2 (and 5.0.1) was blocked in the requirements on master because of similar issues:

https://review.opendev.org/c/openstack/requirements/+/864573

there's a related launchpad issue in neutron:
https://bugs.launchpad.net/neutron/+bug/1995091

it's not really clear in the report but it appears that, in one of their testing patches, they tried to comment out the LOG.<method> calls

our CI jobs still use (see https://6df0d90b08f38024b3b0-ddacc1cdacca3582cad1fbbc53a5babe.ssl.cf2.rackcdn.com/892285/1/check/octavia-v2-dsvm-scenario-amphora-v2/beaa8d0/controller/logs/pip3-freeze.txt)

eventlet==0.33.1
oslo.log==5.0.0

Revision history for this message
Quentin GROLLEAU (quentin.grolleau) wrote :

Hello,

Thank you very much for the response !

Ah, but good point, we'll go back to a 5.0.0 version of oslo.log.

We also found out that they have implemented a parameter to fix that :
https://github.com/openstack/oslo.log/commit/de615d9370681a2834cebe88acfa81b919da340c

And by setting it to false for octavia, it also fix the issue ( fix_eventlet=False )

Have a nice day !

Quentin

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.