Juju Charms Collection
block-storage-broker package

HTTP 500 error on block-storage-relation-changed

Bug #1358907 reported by Paul Larson on 2014-08-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	block-storage-broker (Juju Charms Collection)	Fix Committed	High	Chad Smith

Bug Description

When trying to deploy my team's project (lp:uci-engine) I frequently hit a problem when it tries to attach the volume for the block storage broker:
2014-08-18 15:32:20 INFO juju-log block-storage:31: Attaching ci-airline-ts- postgres/0 nova volume (ce01ad08-e7cb-49ed-b619-0ae342bc3b69)
2014-08-18 15:33:22 INFO block-storage-relation-changed ERROR: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-7be1e4eb-03a8-411b-88dd-bdfd0301d03a)
2014-08-18 15:33:23 ERROR juju-log block-storage:31: ERROR: Command 'nova volume-attach 8a1df19d-540b-46ba-a40b-e7e41fc16057 ce01ad08-e7cb-49ed-b619- 0ae342bc3b69 auto | egrep -o "/dev/vd[b-z]"' returned non-zero exit status 1
2014-08-18 15:33:23 ERROR juju.worker.uniter uniter.go:482 hook failed: exit status 1

If I go back and try to rerun that same command by hand after the failure, it works, so I suspect there's just some race where the block storage broker doesn't sufficiently wait for either the instance or the volume to become available, and doesn't retry?

I'll attach a juju unit log from the bsb instance in case that helps.

Tags:

Related branches

lp://staging/~chad.smith/charms/precise/block-storage-broker/bsb-retries-on-volume-create-and-attach

Merged into lp://staging/charms/block-storage-broker at revision 56

Chad Smith (community): Approve on 2014-09-09

David Britton (community): Needs Fixing on 2014-09-09

Paul Larson: Pending requested 2014-08-22

Revision history for this message

Paul Larson (pwlars) wrote on 2014-08-19:

unit-ci-airline-ts-block-storage-broker-0.log Edit (23.8 KiB, text/plain)

Charles Butler (lazypower) on 2014-08-19

Changed in block-storage-broker (Juju Charms Collection):
assignee:	nobody → Chad Smith (chad.smith)
importance:	Undecided → High
status:	New → Triaged
tags:	added: openstack
tags:	added: papercut

Revision history for this message

Chad Smith (chad.smith) wrote on 2014-08-19:

Charles, thanks for this bug and attaching the log made it easier.

We currently have 10 retries over a total of ~50 seconds while waiting for the volume to transition to "available" status reported by nova volume-list.

Unfortunately, we don't currently retry on spurious 500 errors from nova attach-volume command because I expected that to work when a node is already reporting "available" status. It sounds like this is a frequent occurrence for you, so I'll pull together a branch that specifically retries a configurable number of times for nova commands on 500 errors.

Chad Smith (chad.smith) on 2014-08-22

Changed in block-storage-broker (Juju Charms Collection):
status:	Triaged → In Progress

Revision history for this message

Paul Larson (pwlars) wrote on 2014-08-25:

So far it seems better, I'd like to do some more testing. I've deployed 3 times so far and here were the results:
1 - Everything worked, volume was mounted
2 - Volume was mounted, but so was a second one that it created. Still doesn't seem to see the first all the time.
3 - 2014-08-25 20:47:54 ERROR juju-log block-storage:26: Error: Multiple volumes are associated with ci-airline-ts-postgres/0 nova volume. Cannot get_volume_id. This time it saw the two previous volumes and failed completely. Not the HTTP500 error though, this was the other problem I described where we have to remove the volumes before deploying.

Revision history for this message

Paul Larson (pwlars) wrote on 2014-08-26:

So I've done a few more iterations now without deleting the volume and seen it work more often than not. So while there may still be some corner cases, this *is* an overall improvement I think based on my previous experience.

Revision history for this message

Chad Smith (chad.smith) wrote on 2014-08-26:

Paul, thanks for the deploy on this.

I think the condition of attempting to create a second volume when one with a label postgresql/0 is already present is a separate bug in how we process nova describe-volumes. I'll try to hit this issue on my side today, but it shouldn't be related to this branch. Generally storage tries to leave around the created volume even when the unit dies, so that the storage data isn't lost. Then that same volume gets remounted when a unit of the same name name is restarted/recreated. It looks like something in the nova describe-volumes didn't find a matching volume label for that unit so it tried to create a new one. I'll try reproducing that case with some additional debug logs to see if I can resolve the problem.

Chad Smith (chad.smith) on 2014-09-10

Changed in block-storage-broker (Juju Charms Collection):
status:	In Progress → Fix Committed

Edward Hope-Morley (hopem) on 2014-10-24

tags:

removed: openstack

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

unit-ci-airline-ts-block-storage-broker-0.log Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Juju Charms Collectionblock-storage-broker package

HTTP 500 error on block-storage-relation-changed

Bug Description

Related branches

Other bug subscribers

Bug attachments

Remote bug watches

Juju Charms Collection
block-storage-broker package