It has been further identified that the oslo_messaging is stuck in wait_condition().
oslo_messaging/_drivers/pool.py
{code}
85 def get(self):
86 """Return an item from the pool, when one is available.
87
88 This may cause the calling thread to block.
89 """
90 with self._cond:
91 while True:
92 try:
93 ttl_watch, item = self._items.pop()
94 self.expire()
95 return item
96 except IndexError:
97 pass
98
99 if self._current_size < self._max_size:
100 self._current_size += 1
101 break
102
103 wait_condition(self._cond)
104
105 # We've grabbed a slot and dropped the lock, now do the creation
106 try:
107 return self.create()
108 except Exception:
109 with self._cond:
110 self._current_size -= 1
111 raise
{code}
In the problematic case when requests and service hangs, the following condition is seen:
self._cond = <Condition(<_RLock owner='MainThread' count=1>, 0)>
In our system, the cond.wait() interface is called with timeout of 1.
{code}
# TODO(harlowja): remove this when we no longer have to support 2.7
29 if sys.version_info[0:2] < (3, 2):
30 def wait_condition(cond):
31 # FIXME(markmc): timeout needed to allow keyboard interrupt
32 # http://bugs.python.org/issue8844
33 cond.wait(timeout=1)
34 else:
35 def wait_condition(cond):
36 cond.wait()
{code}
There seems to be some problem with RLock. What do you think?
It has been further identified that the oslo_messaging is stuck in wait_condition().
oslo_messaging/ _drivers/ pool.py self._cond)
{code}
85 def get(self):
86 """Return an item from the pool, when one is available.
87
88 This may cause the calling thread to block.
89 """
90 with self._cond:
91 while True:
92 try:
93 ttl_watch, item = self._items.pop()
94 self.expire()
95 return item
96 except IndexError:
97 pass
98
99 if self._current_size < self._max_size:
100 self._current_size += 1
101 break
102
103 wait_condition(
104
105 # We've grabbed a slot and dropped the lock, now do the creation
106 try:
107 return self.create()
108 except Exception:
109 with self._cond:
110 self._current_size -= 1
111 raise
{code}
In the problematic case when requests and service hangs, the following condition is seen:
self._cond = <Condition(<_RLock owner='MainThread' count=1>, 0)>
In our system, the cond.wait() interface is called with timeout of 1.
{code} info[0: 2] < (3, 2): cond): bugs.python. org/issue8844 timeout= 1) cond):
# TODO(harlowja): remove this when we no longer have to support 2.7
29 if sys.version_
30 def wait_condition(
31 # FIXME(markmc): timeout needed to allow keyboard interrupt
32 # http://
33 cond.wait(
34 else:
35 def wait_condition(
36 cond.wait()
{code}
There seems to be some problem with RLock. What do you think?