cleanup after cancelled jobs

Bug #1449149 reported by Curtis Hovey
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-ci-tools
Triaged
Medium
Unassigned

Bug Description

Many of CI's scripts will terminate and cleanup when given SIGINT, but cancalling a job doesn't send the right signal to the right script.

We want a universal way to guarantee that when a job is cancelled, the juju is told to cleanup (destroy-environment), and that resources are released from the substrate.

Basically, we want to handle SIGTERM the same way we handle SIGINT.

Tags: tech-debt
Aaron Bentley (abentley)
description: updated
Curtis Hovey (sinzui)
Changed in juju-ci-tools:
importance: Medium → High
Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

Aaron's notes from e-mails:

I'm pretty sure that we expect BootstrapManager.dump_all_logs to be a
best-effort mechanism that does all it can and does not raise exceptions
if the underlying operation fails. The tear_down is in the same finally
clause as dump_all_logs (and has been for a long time). So if
dump_all_logs does raise an exception, we won't attempt to clean up.

I think it's correct to expect dump_all_logs to not raise an exception,
but we cannot have a guarantee. People make mistakes. Library code can
raise surprising exceptions. In fact, we already handle
KeyboardInterrupt. So we should handle dump_all_logs exceptions. Maybe
something like this:

try:
    self.dump_all_logs()
except Exception as e:
    logging.exception('dump_all_logs raised an exception')
self.tear_down()

summary: - Cleanup after cancelled jobs
+ Need to test juju cleanup
summary: - Need to test juju cleanup
+ cleanup after cancelled jobs
Changed in juju-ci-tools:
importance: High → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.