geard, zuul, and jenkins do not handle function registration cleanly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Core Infrastructure |
Fix Released
|
High
|
James E. Blair |
Bug Description
We moved zuul today from old 8GB host to a new 30GB host so that the zuul scratch git space could be hosted on a tmpfs. When we did this the jenkins masters gearman plugin was not registering all of their functions with gearman. It looked like something caused geard to crash resulting in many stacktrace (included below) in the gearman-server.log. Best guess is whatever caused those stacktraces broke job registration for the remaining jobs.
We worked around this by stopping all jenkins. Then bringing them online one at a time so that they could each register in their own time slices. This seems to have worked around the problem well enough.
From gearman-server.log:
2014-01-17 18:32:56,058 ERROR gear.BaseClient
Traceback (most recent call last):
File "/usr/local/
self.
File "/usr/local/
self.
File "/usr/local/
self.
File "/usr/local/
functions = self._getFuncti
File "/usr/local/
functions[
KeyError: 'stop:jenkins03
2014-01-17 18:34:20,450 ERROR gear.BaseClient
Traceback (most recent call last):
File "/usr/local/
self.
File "/usr/local/
self.
File "/usr/local/
self.
File "/usr/local/
functions = self._getFuncti
File "/usr/local/
functions[
KeyError: 'stop:jenkins03
From zuul debug.log:
2014-01-17 18:34:26,057 ERROR zuul.Gearman: Exception while checking functions
Traceback (most recent call last):
File "/usr/local/
connection.
File "/usr/local/
raise TimeoutError()
TimeoutError
2014-01-17 18:34:26,058 DEBUG zuul.Gearman: Function set_description
2014-01-17 18:35:56,058 ERROR zuul.Gearman: Exception while checking functions
Traceback (most recent call last):
File "/usr/local/
connection.
File "/usr/local/
raise TimeoutError()
TimeoutError
2014-01-17 18:35:56,058 DEBUG zuul.Gearman: Function set_description
Changed in openstack-ci: | |
assignee: | nobody → James E. Blair (corvus) |
This was fixed in Id6d4569bed6cd5 1dc4f1698184c54 a0cb343fb0d opensatck- infra/gear change which handles an exception in the status command. This change is included in the latest 0.5.3 gear release.