deferToThread cannot wait for a thread in the same threadpool
Bug #1447208 reported by
Blake Rouse
This bug affects 3 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Blake Rouse |
Bug Description
This is the core issue we identified at the sprint. We need to make sure this is not being done, as it will cause the regiond to just dead lock and can also cause timeout errors.
Any deferToThread that is done from inside a thread, and the thread waits for the result can fail. This is because only 10 threads can run at once and if the pool is waiting for 10 other threads that cannot run, then a dead lock will occur.
There is 2 ways around this:
1. Do not wait for a thread inside of a thread.
2. Defer to a different threadpool then the one the thread is using. *We might be able to alternate between pools to fix this issue.*
Related branches
lp://staging/~blake-rouse/maas/cordinate-notifies
- Gavin Panella (community): Approve
-
Diff: 398 lines (+99/-60)2 files modifiedsrc/maasserver/websockets/listener.py (+54/-23)
src/maasserver/websockets/tests/test_listener.py (+45/-37)
lp://staging/~blake-rouse/maas/deferLater-deallocate-ip
- Raphaël Badin (community): Approve
-
Diff: 185 lines (+47/-16)2 files modifiedsrc/maasserver/models/node.py (+17/-7)
src/maasserver/models/tests/test_node.py (+30/-9)
description: | updated |
summary: |
- deferToThread should not wait for a thread in the same threadpool + deferToThread cannot wait for a thread in the same threadpool |
Changed in maas: | |
status: | Triaged → In Progress |
assignee: | nobody → Blake Rouse (blake-rouse) |
tags: | added: oil |
Changed in maas: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
I just came across this issue today again:
https:/ /bugs.launchpad .net/maas/ +bug/1446915
=> /var/log/ maas/regiond. log <== 2015:15: 08:52 +0000] "GET /MAAS/metadata/ /2012-03- 01/user- data HTTP/1.1" 200 3439 "-" "python- requests/ 2.2.1 CPython/2.7.6 Linux/3. 13.0-35- generic" 2015:15: 08:54 +0000] "GET /MAAS/api/ 1.0/nodes/ ?nodes= node-a0754dc0- c4cd-11e3- 824b-00163efc50 68&op=deploymen t_status HTTP/1.1" 200 64 "-" "Go 1.1 package http" 2015:15: 08:58 +0000] "GET /MAAS/api/ 1.0/nodes/ ?agent_ name=a4a9be1b- 5c0a-424d- 81d7-35a43d9478 b1&id=node- a5224922- ae98-11e3- b194-00163efc50 68&op=list HTTP/1.1" 200 2 "-" "Go 1.1 package http" python2. 7/dist- packages/ django/ core/handlers/ base.py" , line 112, in get_response callback( request, *callback_args, **callback_kwargs) python2. 7/dist- packages/ maasserver/ api/support. py", line 52, in __call__ python2. 7/dist- packages/ django/ views/decorator s/vary. py", line 19, in inner_func python2. 7/dist- packages/ piston/ resource. py", line 167, in __call__ handler( e, request, meth, em_format) python2. 7/dist- packages/ piston/ resource. py", line 165, in __call__ python2. 7/dist- packages/ maasserver/ api/support. py", line 200, in dispatch python2. 7/dist- packages/ maasserver/ api/nodes. py", line 412, in start start(request. user, user_data= user_data) python2. 7/dist- packages/ maasserver/ utils/orm. py", line 399, in call_within_ transaction txn(*args, **kwargs) python2. 7/dist- packages/ django/ db/transaction. py", line 339, in inner python2. 7/dist- packages/ maasserver/ models/ node.py" , line 1920, in start static_ ip_addresses( ) python2. 7/dist- packages/ maasserver/ models/ node.py" , line 1742, in claim_static_ ip_addresses type=alloc_ type, requested_ address= requested_ address) python2. 7/dist- packages/ maasserver/ models/ macaddress. py", line 358, in claim_static_ips python2. 7/dist- packages/ maasserver/ models/ macaddress. py", line 247, in _allocate_ static_ address python2. 7/dist- packages/ maasserver/ models/ staticipaddress .py", line 179, in allocate_new hostname) .wait(30) python2. 7/dist- packages/ crochet/ _eventloop. py", line 217, in wait timeout) python2. 7/dist- packages/ crochet/ _eventloop. py", line 195, in _result
2015-04-27 15:08:53 [-] 127.0.0.1 - - [27/Apr/
2015-04-27 15:08:55 [-] 127.0.0.1 - - [27/Apr/
2015-04-27 15:08:58 [-] 127.0.0.1 - - [27/Apr/
2015-04-27 15:08:59 [root] ERROR:
Traceback (most recent call last):
File "/usr/lib/
response = wrapped_
File "/usr/lib/
response = upcall(request, *args, **kwargs)
File "/usr/lib/
response = func(*args, **kwargs)
File "/usr/lib/
result = self.error_
File "/usr/lib/
result = meth(request, *args, **kwargs)
File "/usr/lib/
return function(self, request, *args, **kwargs)
File "/usr/lib/
node.
File "/usr/lib/
return func_within_
File "/usr/lib/
return func(*args, **kwargs)
File "/usr/lib/
claims = self.claim_
File "/usr/lib/
alloc_
File "/usr/lib/
interface, alloc_type, requested_address, user=user)
File "/usr/lib/
user=user)
File "/usr/lib/
alloc_type, user, hostname=
File "/usr/lib/
result = self._result(
File "/usr/lib/
raise TimeoutError()
TimeoutError
I also came across: https:/ /bugs.launchpad .net/maas/ +bug/1379370 ... I wonder if there's something related t...