HTTP 500 error on block-storage-relation-changed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
block-storage-broker (Juju Charms Collection) |
Fix Committed
|
High
|
Chad Smith |
Bug Description
When trying to deploy my team's project (lp:uci-engine) I frequently hit a problem when it tries to attach the volume for the block storage broker:
2014-08-18 15:32:20 INFO juju-log block-storage:31: Attaching ci-airline-ts- postgres/0 nova volume (ce01ad08-
2014-08-18 15:33:22 INFO block-storage-
2014-08-18 15:33:23 ERROR juju-log block-storage:31: ERROR: Command 'nova volume-attach 8a1df19d-
2014-08-18 15:33:23 ERROR juju.worker.uniter uniter.go:482 hook failed: exit status 1
If I go back and try to rerun that same command by hand after the failure, it works, so I suspect there's just some race where the block storage broker doesn't sufficiently wait for either the instance or the volume to become available, and doesn't retry?
I'll attach a juju unit log from the bsb instance in case that helps.
Related branches
- Chad Smith (community): Approve
- David Britton (community): Needs Fixing
- Paul Larson: Pending requested
-
Diff: 217 lines (+78/-34)3 files modifiedhooks/test_hooks.py (+4/-4)
hooks/test_util.py (+38/-13)
hooks/util.py (+36/-17)
Changed in block-storage-broker (Juju Charms Collection): | |
assignee: | nobody → Chad Smith (chad.smith) |
importance: | Undecided → High |
status: | New → Triaged |
tags: | added: openstack |
tags: | added: papercut |
Changed in block-storage-broker (Juju Charms Collection): | |
status: | Triaged → In Progress |
Changed in block-storage-broker (Juju Charms Collection): | |
status: | In Progress → Fix Committed |
tags: | removed: openstack |
Charles, thanks for this bug and attaching the log made it easier.
We currently have 10 retries over a total of ~50 seconds while waiting for the volume to transition to "available" status reported by nova volume-list.
Unfortunately, we don't currently retry on spurious 500 errors from nova attach-volume command because I expected that to work when a node is already reporting "available" status. It sounds like this is a frequent occurrence for you, so I'll pull together a branch that specifically retries a configurable number of times for nova commands on 500 errors.