PCJ race between process-job-source.py and celery can generate OOPS
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Triaged
|
Critical
|
Unassigned |
Bug Description
I got OOPS-7d0f700be19191e98139cdab67a81ea7, which is:
InvalidTransi
Traceback (most recent call last):
Module lazr.jobrunner.
self.
Module lp.services.
super(
Module lazr.jobrunner.
job.
Module lp.services.
self.
Module lp.services.
raise InvalidTransiti
InvalidTransition: Transition from Running to Running is invalid.
<oops-message-0>: {'target_
This was because the job had been picked up by celery at almost exactly the same time:
[2014-04-30 09:23:13,769: DEBUG3/
[2014-04-30 09:23:13,881: INFO/PoolWorker-3] Running <PlainPackageCo
2014-04-30 09:23:13 DEBUG Trying to acquire lease for job in state Waiting
2014-04-30 09:23:13 INFO Running <PlainPackageCo
2014-04-30 09:23:14 INFO Job resulted in OOPS: OOPS-7d0f700be19191e98139cdab67a81ea7
So this is harmless in that the copy happened anyway, but Critical by Launchpad bug policy since it shouldn't generate an OOPS.
I thought the point of acquiring a lease for the job was that it couldn't be picked up by another job runner. Does celery not honour that?
Changed in launchpad: | |
status: | New → Triaged |
I think the problem may be in lazr.jobrunner. RunJob.run does indeed do a job.acquireLease(), but it doesn't commit the transaction at that point (unlike JobRunner.runAll) so other processes won't see it.