Non-responsive zookeeper leads to spinning agent and traceback
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pyjuju |
In Progress
|
Wishlist
|
Kapil Thangavelu | ||
txzookeeper |
Fix Committed
|
Wishlist
|
Kapil Thangavelu |
Bug Description
When the zookeeper is non responsive the client agent makes things worse by not backing off its requests.
Traceback:
2012-10-30 08:33:49,921: twisted@ERROR: Traceback (most recent call last):
2012-10-30 08:33:49,922: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,922: twisted@ERROR: if self._check_
2012-10-30 08:33:49,922: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,922: twisted@ERROR: self, error)
2012-10-30 08:33:49,922: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,922: twisted@ERROR: result = f(*args, **kw)
2012-10-30 08:33:49,922: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,922: twisted@ERROR: return _inlineCallback
2012-10-30 08:33:49,922: twisted@ERROR: --- <exception caught here> ---
2012-10-30 08:33:49,922: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,923: twisted@ERROR: result = g.send(result)
2012-10-30 08:33:49,923: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,923: twisted@ERROR: raise error
2012-10-30 08:33:49,923: twisted@ERROR: zookeeper.
2012-10-30 08:33:49,923: twisted@ERROR: Traceback (most recent call last):
2012-10-30 08:33:49,923: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,923: twisted@ERROR: if self._check_
2012-10-30 08:33:49,923: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,923: twisted@ERROR: self, error)
2012-10-30 08:33:49,924: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,924: twisted@ERROR: result = f(*args, **kw)
2012-10-30 08:33:49,924: twisted@ERROR: File "/usr/lib/
2012-10-30 08:33:49,924: twisted@ERROR: return _inlineCallback
2012-10-30 08:33:49,924: twisted@ERROR: --- <exception caught here> ---
2012-10-30 08:33:49,924: twisted@ERROR: File "/usr/lib/
Related branches
- Kapil Thangavelu: Pending requested
-
Diff: 3336 lines (+2225/-241)27 files modified.bzrignore (+4/-0)
Makefile (+6/-0)
debian/changelog (+18/-0)
setup.py (+6/-3)
txzookeeper/__init__.py (+5/-2)
txzookeeper/client.py (+159/-79)
txzookeeper/lock.py (+6/-3)
txzookeeper/managed.py (+413/-0)
txzookeeper/node.py (+7/-26)
txzookeeper/queue.py (+5/-2)
txzookeeper/retry.py (+359/-0)
txzookeeper/tests/__init__.py (+57/-7)
txzookeeper/tests/common.py (+27/-2)
txzookeeper/tests/proxy.py (+97/-0)
txzookeeper/tests/test_client.py (+63/-77)
txzookeeper/tests/test_conn_failure.py (+194/-0)
txzookeeper/tests/test_lock.py (+4/-1)
txzookeeper/tests/test_managed.py (+309/-0)
txzookeeper/tests/test_node.py (+6/-3)
txzookeeper/tests/test_queue.py (+5/-2)
txzookeeper/tests/test_retry.py (+332/-0)
txzookeeper/tests/test_security.py (+4/-1)
txzookeeper/tests/test_session.py (+111/-20)
txzookeeper/tests/test_utils.py (+4/-1)
txzookeeper/tests/utils.py (+4/-1)
txzookeeper/todo.txt (+1/-9)
txzookeeper/utils.py (+19/-2)
Changed in txzookeeper: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Clint Byrum (clint-fewbar) |
Changed in txzookeeper: | |
assignee: | Clint Byrum (clint-fewbar) → Kapil Thangavelu (hazmat) |
Changed in juju: | |
assignee: | nobody → Kapil Thangavelu (hazmat) |
status: | New → In Progress |
importance: | Undecided → Critical |
milestone: | none → 0.8 |
Changed in txzookeeper: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
importance: | Critical → Wishlist |
Changed in txzookeeper: | |
importance: | High → Wishlist |
Looks like txzookeeper. managed. SessionClient needs to understand ConnectionLossE xception so it doesn't spray exceptions as this is a normal condition like the others.. AND we need to back off and give the zookeeper server a little breathing room while retrying.