Comment 4 for bug 1706377

Revision history for this message
Adam Vinsh (adam-vinsh) wrote :

Hey all.
Just started up a big live-migration project in our pike based cluster. Testing showed that migrations were working well. However after a batch of about 200 instances 3 have now lost customer data. This manifests as a migration going into error state as described in this bug again. Then we re-try the migration to the same destination and it completes. Customer then reports that all of their data has been wiped back to the base image they first deployed months ago on the affected instances. My guess is the local disk image isn't fully copied over before it's deleted at the source so nova just boots from the base image.

Indeed in libvirt logs for these failures is this signature:
drv_co_pwritev: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

Does anyone working on this have any current info on how we can fix this in our pike cluster? This same issue hasn't happened in the queens cluster. We can't yet upgrade the pike cluster for various reasons that require a bunch of live migration to begin with.

-Adam