Comment 0 for bug 1380220

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote : OpenStack services consume a lot of CPU time when oslo.messaging is used

On a newly deployed cluster, after creating some load (e.g. running Rally scenarios), top shows that many of OpenStack services start to consume CPU time heavily: http://paste.openstack.org/show/120460/

This is caused by the fact those services are excessively polling open sockets (http://paste.openstack.org/show/120461/) using a very small timeout value (close to 0, while the eventlet default is 60).

Further investigation shown that services which didn't use oslo.messaging were't affected.

It turns out that CPython 2.6/2.7 implementation of condition variables plays badly with eventlet event loop. oslo.messaging has a place in the code (https://gerrit.mirantis.com/gitweb?p=openstack/oslo.messaging.git;a=blob;f=oslo/messaging/_drivers/impl_rabbit.py;h=dfed27851a36143e31448c77772e2a77597c94c6;hb=45d0e2742aa29c242f027de5edb54ba3db95cc33#l857) in which it tries to put the current thread into sleep until some condition is true passing a sane timeout value (24.0 s). Unfortunately, CPython provides its own implementation of conditional variables and doesn't use corresponding pthreads calls. In CPython 2.6/2.7 wait(timeout) for conditional variables is implemented as polling after a short sleep in a loop (https://github.com/akheron/cpython/blob/2.7/Lib/threading.py#L344-L369). Sleeps of 0.0005 to 0.05 seconds are the values passed to poll()/epoll_wait() in eventlet eventually, causing the process to wake up much more often than it really should (as there are no socket events to process). And user space <-> kernel space switches are expensive.

FWIW, PyPy and CPython 3.2+ shouldn't have this bug, but their compatibility with eventlet is an open question.

There must be at least two ways to fix this:

1) backport changes to thread.c and threading.py from CPython 3.2 to CPython 2.6/2.7, build and use custom packages

2) add a workaround to oslo.messaging (don't use a conditional variable in that particular place)

The former might affect CPython stability and should be throughly tested, so the latter seems to be a 'good enough' work around for now.