Comment 1 for bug 1931567

Revision history for this message
Ian Booth (wallyworld) wrote :

Looking at a database dump, the issue is that the parent operation consisted of 39 actions across 39 units - 38 of them are marked as completed, 1 is marked as aborting (the one we are looking at here 3783)

An aborting action only happens if someone has run juju cancel-action. the action would be killed by juju and then marked as aborted but that hasn't happened so it's still in aborting state. Sometimes the process can get hung and juju will forcibly kill it but that hasn't happened - perhaps the unit agent got shut down before this could happen.

The logs show the unit agent is making an API call to fail the action (set status to failed),
but, the parent operation itself is marked as completed but it's really not because 1 action is not complete yet (still aborting) so juju gets confused.

Given the unit agent appears to be trying to set the action to failed, we can try to set the parent operation state back to running; this should allow things to progress

db.operations.update({"_id" : "f7afc459-639a-44e6-8bf1-ad5928637772:3754"},{$set: { "status" : "running"}});

We will need to loosen how strict juju is with checking for expected state so that in cases like this juju will mark the offended action as failed even if the parent thinks it is already complete.