On a newly deployed cluster, after creating some load (e.g. running Rally scenarios), top shows that many of OpenStack services start to consume CPU time heavily: http://paste.openstack.org/show/120460/
This is caused by the fact those services are excessively polling open sockets (http://paste.openstack.org/show/120461/) using a very small timeout value (close to 0, while the eventlet default is 60).
Further investigation shown that services which didn't use oslo.messaging were't affected.
On a newly deployed cluster, after creating some load (e.g. running Rally scenarios), top shows that many of OpenStack services start to consume CPU time heavily: http:// paste.openstack .org/show/ 120460/
This is caused by the fact those services are excessively polling open sockets (http:// paste.openstack .org/show/ 120461/) using a very small timeout value (close to 0, while the eventlet default is 60).
Further investigation shown that services which didn't use oslo.messaging were't affected.
It turns out that CPython 2.6/2.7 implementation of condition variables plays badly with eventlet event loop. oslo.messaging has a place in the code (https:/ /gerrit. mirantis. com/gitweb? p=openstack/ oslo.messaging. git;a=blob; f=oslo/ messaging/ _drivers/ impl_rabbit. py;h=dfed27851a 36143e31448c777 72e2a77597c94c6 ;hb=45d0e2742aa 29c242f027de5ed b54ba3db95cc33# l857) in which it tries to put the current thread into sleep until some condition is true passing a sane timeout value (24.0 s). Unfortunately, CPython provides its own implementation of conditional variables and doesn't use corresponding pthreads calls. In CPython 2.6/2.7 wait(timeout) for conditional variables is implemented as polling after a short sleep in a loop (https:/ /github. com/akheron/ cpython/ blob/2. 7/Lib/threading .py#L344- L369). Sleeps of 0.0005 to 0.05 seconds are the values passed to poll()/epoll_wait() in eventlet eventually, causing the process to wake up much more often than it really should (as there are no socket events to process). And user space <-> kernel space switches are expensive.
FWIW, PyPy and CPython 3.2+ shouldn't have this bug, but their compatibility with eventlet is an open question.
There must be at least two ways to fix this:
1) backport changes to thread.c and threading.py from CPython 3.2 to CPython 2.6/2.7, build and use custom packages
2) add a workaround to oslo.messaging (don't use a conditional variable in that particular place)
The former might affect CPython stability and should be throughly tested, so the latter seems to be a 'good enough' work around for now.