destory-environment errors and hangs forever
Bug #863510 reported by
Gustavo Niemeyer
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pyjuju |
Fix Released
|
Medium
|
Jim Baker |
Bug Description
As seen in ftests:
+++ juju destroy-environment
2011-09-29 22:46:08,163 INFO Destroying environment 'sample' (type: ec2)...
2011-09-29 22:46:09,966 INFO Waiting on 3 EC2 instances to transition to terminated state, this may take a while
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
Failure: twisted.
Related branches
lp://staging/~jimbaker/pyjuju/remove-sec-grp-do-not-ignore-exception
- Gustavo Niemeyer: Approve
-
Diff: 102 lines (+39/-24)2 files modifiedjuju/providers/ec2/securitygroup.py (+4/-23)
juju/providers/ec2/tests/test_securitygroup.py (+35/-1)
Changed in juju: | |
status: | New → In Progress |
assignee: | nobody → Jim Baker (jimbaker) |
milestone: | none → eureka |
importance: | Undecided → Medium |
Changed in juju: | |
status: | In Progress → Fix Released |
To post a comment you must log in.
The destroy-environment command currently does not attempt any
retries. The question is whether this command itself should, or the
user of the command should, if necessary.
It's not clear that the specific exception reported in this bug, tionError) . But regardless, in this code path, if there
TimeoutError, is more relevant than other possible errors. The general
expectation of our code using txaws is that it wraps exceptions in
EC2Error (which might be ignored or further wrapped as
ProviderInterac
is an error, environment destruction is stopped. This includes taking
too long. This suggests at least that the command itself must be
retried.
Any such errors will result in the exit code set to 1 (contrary to the
bug report of #697093, as I mention there, this is a problem in our
testing, not in the code itself). So this command can be retried
automatically if that is returned.
The last point to consider is it possible for environment destruction
to be wedged? Trying to do this suggests that it might be possible,
but for only a short period of time. Re-running destroy-environment
eventually succeeds. Machines are always linked to their security
group, so it's always possible (eventually) to iterate over them and
destroy. Although this doesn't guarantee all security groups will be
deleted, so long as the link to the machine is gone, a subsequent
bootstrap process will succeed.
Hence this analysis suggests that the proper way to resolve this bug
report is to better document assumptions, especially when this command
is scripted. We may also want to look at adding to the test suite for
this scenario.