HTTP 500 error on block-storage-relation-changed

Bug #1358907 reported by Paul Larson
This bug affects 1 person
Affects Status Importance Assigned to Milestone
block-storage-broker (Juju Charms Collection)
Fix Committed
Chad Smith

Bug Description

When trying to deploy my team's project (lp:uci-engine) I frequently hit a problem when it tries to attach the volume for the block storage broker:
2014-08-18 15:32:20 INFO juju-log block-storage:31: Attaching ci-airline-ts- postgres/0 nova volume (ce01ad08-e7cb-49ed-b619-0ae342bc3b69)
2014-08-18 15:33:22 INFO block-storage-relation-changed ERROR: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-7be1e4eb-03a8-411b-88dd-bdfd0301d03a)
2014-08-18 15:33:23 ERROR juju-log block-storage:31: ERROR: Command 'nova volume-attach 8a1df19d-540b-46ba-a40b-e7e41fc16057 ce01ad08-e7cb-49ed-b619- 0ae342bc3b69 auto | egrep -o "/dev/vd[b-z]"' returned non-zero exit status 1
2014-08-18 15:33:23 ERROR juju.worker.uniter uniter.go:482 hook failed: exit status 1

If I go back and try to rerun that same command by hand after the failure, it works, so I suspect there's just some race where the block storage broker doesn't sufficiently wait for either the instance or the volume to become available, and doesn't retry?

I'll attach a juju unit log from the bsb instance in case that helps.

Tags: papercut

Related branches

Revision history for this message
Paul Larson (pwlars) wrote :
Changed in block-storage-broker (Juju Charms Collection):
assignee: nobody → Chad Smith (chad.smith)
importance: Undecided → High
status: New → Triaged
tags: added: openstack
tags: added: papercut
Revision history for this message
Chad Smith (chad.smith) wrote :

Charles, thanks for this bug and attaching the log made it easier.

We currently have 10 retries over a total of ~50 seconds while waiting for the volume to transition to "available" status reported by nova volume-list.

Unfortunately, we don't currently retry on spurious 500 errors from nova attach-volume command because I expected that to work when a node is already reporting "available" status. It sounds like this is a frequent occurrence for you, so I'll pull together a branch that specifically retries a configurable number of times for nova commands on 500 errors.

Chad Smith (chad.smith)
Changed in block-storage-broker (Juju Charms Collection):
status: Triaged → In Progress
Revision history for this message
Paul Larson (pwlars) wrote :

So far it seems better, I'd like to do some more testing. I've deployed 3 times so far and here were the results:
1 - Everything worked, volume was mounted
2 - Volume was mounted, but so was a second one that it created. Still doesn't seem to see the first all the time.
3 - 2014-08-25 20:47:54 ERROR juju-log block-storage:26: Error: Multiple volumes are associated with ci-airline-ts-postgres/0 nova volume. Cannot get_volume_id. This time it saw the two previous volumes and failed completely. Not the HTTP500 error though, this was the other problem I described where we have to remove the volumes before deploying.

Revision history for this message
Paul Larson (pwlars) wrote :

So I've done a few more iterations now without deleting the volume and seen it work more often than not. So while there may still be some corner cases, this *is* an overall improvement I think based on my previous experience.

Revision history for this message
Chad Smith (chad.smith) wrote :

Paul, thanks for the deploy on this.

I think the condition of attempting to create a second volume when one with a label postgresql/0 is already present is a separate bug in how we process nova describe-volumes. I'll try to hit this issue on my side today, but it shouldn't be related to this branch. Generally storage tries to leave around the created volume even when the unit dies, so that the storage data isn't lost. Then that same volume gets remounted when a unit of the same name name is restarted/recreated. It looks like something in the nova describe-volumes didn't find a matching volume label for that unit so it tried to create a new one. I'll try reproducing that case with some additional debug logs to see if I can resolve the problem.

Chad Smith (chad.smith)
Changed in block-storage-broker (Juju Charms Collection):
status: In Progress → Fix Committed
tags: removed: openstack
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.