Sporadic failure when creating XS vms from machine-images

Bug #732801 reported by Rick Harris
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Johannes Erdfelt

Bug Description

When trying to spin up XenServer VMs using machine style images (kernel outside of the image), we're seeing sporadic failures.

The Instance goes to active (which is probably wrong) but the compute node generates this traceback: http://paste.openstack.org/show/857/

It's possible there is a race here and we could fix this by polling or introducing a carefully placed `sleep 2`

Plan to Fix:

1. The first step is to figure out why the failed VMs are showing up as 'active' instead of 'failed'
2. Once that's fix, planning to write a test harness that spins up 10 instances and see how many fail.

Related branches

Revision history for this message
Rick Harris (rconradharris) wrote :

Also received this, a completely different error: http://paste.openstack.org/show/859/

Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Hi Rick, do you still see this or could os.popen('udevsettle') in vm_utils have solved the problem?

Thanks,
Armando

Revision history for this message
Johannes Erdfelt (johannes.erdfelt) wrote :

I have also run into this bug a handful of times. In the case of my system, it appears to be a race condition where the device node isn't created quickly enough.

udevsettle (or udevadm settle) is likely to work, but it assumes a system is running udev. I've attached a patch which just polls until the device is created, which appears to be what most other code in nova does already in similar situations.

Changed in nova:
assignee: nobody → Johannes Erdfelt (johannes.erdfelt)
Thierry Carrez (ttx)
Changed in nova:
status: Confirmed → In Progress
Thierry Carrez (ttx)
Changed in nova:
milestone: none → cactus-rc
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: cactus-rc → 2011.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.