Provisioning failed with UTAHBMProvisioningException: Cobbler command failed: system edit --name=dx-autopilot-nvidia --netboot-enabled=N
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
UTAH |
Triaged
|
High
|
Unassigned |
Bug Description
Provisioning of a system with an nvidia card failed with:
2013-01-16 02:34:33,387 dx-autopilot-nvidia WARNING: Return code: 1
2013-01-16 02:34:33,387 dx-autopilot-nvidia ERROR: Cobbler command failed: system edit --name=
Traceback (most recent call last):
File "/usr/lib/
machine.
File "/usr/lib/
raise err
UTAHBMProvision
2013-01-16 02:34:33,477 dx-autopilot-nvidia ERROR: Failed to install client. Writing empty reports.
Blocking daily publication of unity indicators.
http://
Output of the console attached.
Changed in utah: | |
status: | New → Triaged |
importance: | Undecided → High |
tags: | added: ue-desktop |
One thing I believe we're doing pretty bad is that debug logs for utah server
side aren't present in the jenkins jobs artifacts. This makes problems more
difficult to troubleshoot and require investigation just to get the error error
message that should be in the log (a separate bug will be opened to track this
issue).
In this case, what I find looking at the debug log file for the utah server is:
2013-01-16 02:34:33,168 dx-autopilot-nvidia DEBUG: Running command: sudo cobbler system edit --name= dx-autopilot- nvidia --netboot-enabled=N KeyError' >:'1IHI2xVY6vlr yAQpbOCeeGDIRUI OnAn6PQ= =' dx-autopilot- nvidia --netboot-enabled=N
2013-01-16 02:34:33,168 dx-autopilot-nvidia DEBUG: Output follows:
2013-01-16 02:34:33,373 dx-autopilot-nvidia DEBUG: ### ERROR ###
2013-01-16 02:34:33,373 dx-autopilot-nvidia DEBUG: Unexpected remote error, check the server side logs for further info
2013-01-16 02:34:33,374 dx-autopilot-nvidia DEBUG: <type 'exceptions.
2013-01-16 02:34:33,386 dx-autopilot-nvidia DEBUG:
2013-01-16 02:34:33,387 dx-autopilot-nvidia WARNING: Return code: 1
2013-01-16 02:34:33,387 dx-autopilot-nvidia ERROR: Cobbler command failed: system edit --name=
Since the cobbler command output recommends to look into the cobbler log, I've
done that and what I've found is:
Wed Jan 16 02:24:30 2013 - DEBUG | get_item; ['system', 'dx-autopilot- nvidia' ] nvidia' ] KeyError' > QpbOCeeGDIRUIOn An6PQ== ' python2. 7/dist- packages/ cobbler/ remote. py", line 1944, in _dispatch handle( *params) python2. 7/dist- packages/ cobbler/ remote. py", line 857, in xapi_object_edit check_access( token," xedit_% s" % object_type, token) python2. 7/dist- packages/ cobbler/ remote. py", line 1716, in check_access validate_ token(token) python2. 7/dist- packages/ cobbler/ remote. py", line 1660, in __validate_token __invalidate_ expired_ tokens( ) python2. 7/dist- packages/ cobbler/ remote. py", line 1626, in __invalidate_ expired_ tokens cache[token]
Wed Jan 16 02:24:30 2013 - DEBUG | done with get_item; ['system', 'dx-autopilot-
Wed Jan 16 02:24:30 2013 - DEBUG | REMOTE CLI Authorized; user(?)
Wed Jan 16 02:34:33 2013 - DEBUG | REMOTE expiring token; user(<DIRECT>)
Wed Jan 16 02:34:33 2013 - DEBUG | REMOTE expiring token; user(<DIRECT>)
Wed Jan 16 02:34:33 2013 - DEBUG | REMOTE CLI Authorized; user(?)
Wed Jan 16 02:34:33 2013 - INFO | Exception occured: <type 'exceptions.
Wed Jan 16 02:34:33 2013 - INFO | Exception value: '1IHI2xVY6vlryA
Wed Jan 16 02:34:33 2013 - DEBUG | get_item; ['system', 'dx-autopilot-ati']
Wed Jan 16 02:34:33 2013 - INFO | Exception Info:
File "/usr/lib/
return method_
File "/usr/lib/
self.
File "/usr/lib/
validated = self.__
File "/usr/lib/
self.
File "/usr/lib/
del self.token_
So the real problem here, based on the exception traceback, is that cobbler
tried to remove a token from an internal cache that wasn't there.
I've looked into the cobbler issues in github and haven't found any that
matches this problem, so I'm not sure if it's caused by an unexpected usage of
the cobbler commands or a race condition (I see in logs that ati and nvidia
test cases were running in parallel).