> 2016-01-26 10:30:51.895+0000: shutting down
> load of migration failed
The fact that we see the shutting down message before the "load of migration failed" message strongly suggests that libvirt is proactively shutting down the QEMU. ie the failure to load migration data is *not* the cause of the problem - it is just a side effect of libvirt shutting down QEMU.
There is no sign of errors on the dest host libvirt logs - it is simply told by the source host that migration has failed and to shutdown the new QEMU.
Going back to the source host we see migration has started succesfully and transferred alot of data. eg the last status update before failure shows:
That last 'query-migrate' command took a surprisingly long time to get a reply - 600ms - in between the command & reply we get a tonne of block I/o errors reported.
What I'm not sure of is why the block I/O errors would cause migration to abort - this shouldn't happen unless libvirt is asked for it, and nova doesn't ask that.
Looking at logs from this job:
http:// logs.openstack. org/83/ 252283/ 96/experimental /gate-tempest- dsvm-multinode- live-migration/ 469db74/
On the target host QEMU logfile we see
> 2016-01-26 10:30:51.895+0000: shutting down
> load of migration failed
The fact that we see the shutting down message before the "load of migration failed" message strongly suggests that libvirt is proactively shutting down the QEMU. ie the failure to load migration data is *not* the cause of the problem - it is just a side effect of libvirt shutting down QEMU.
There is no sign of errors on the dest host libvirt logs - it is simply told by the source host that migration has failed and to shutdown the new QEMU.
Going back to the source host we see migration has started succesfully and transferred alot of data. eg the last status update before failure shows:
2016-01-26 10:30:51.197+0000: 8591: debug : qemuMonitorJSON CommandWithFd: 264 : Send command '{"execute" :"query- migrate" ,"id":" libvirt- 144"}' for write with FD -1 SEND_MSG: mon=0x7f1e0c00db30 msg={"execute" :"query- migrate" ,"id":" libvirt- 144"} ite:504 : QEMU_MONITOR_ IO_WRITE: mon=0x7f1e0c00db30 buf={"execute" :"query- migrate" ,"id":" libvirt- 144"} ocess:396 : QEMU_MONITOR_ IO_PROCESS: mon=0x7f1e0c00db30 buf={"return": {"expected- downtime" : 106, "status": "active", "setu IOProcessLine: 157 : Line [{"return": {"expected- downtime" : 106, "status": "active", "setup-time": 2, "total-time": 6613, "r IOProcessLine: 177 : QEMU_MONITOR_ RECV_REPLY: mon=0x7f1e0c00db30 reply={"return": {"expected- downtime" : 106, "status": "active", "setup-time": 2, "total-time": 6613, "ram": {"total": 67776512, "remaining": 323584, "mbps": 95.288421, "transferred": 82742303, "duplicate": 4631, "dirty-pages-rate": 310, "skipped": 0, "normal-bytes": 83189760, "normal": 20310}}, "id": "libvirt-144"}
2016-01-26 10:30:51.197+0000: 8591: debug : qemuMonitorSend:959 : QEMU_MONITOR_
fd=-1
2016-01-26 10:30:51.199+0000: 8590: debug : qemuMonitorIOWr
len=48 ret=48 errno=11
2016-01-26 10:30:51.199+0000: 8590: debug : qemuMonitorIOPr
p-time": 2, "total-time": 6613, "ram": {"total": 67776512, "remaining": 323584, "mbps": 95.288421, "transferred": 82742303, "duplicate": 4631, "dirty-pages-rate": 310, "skipped": 0,
"normal-bytes": 83189760, "normal": 20310}}, "id": "libvirt-144"}
len=312
2016-01-26 10:30:51.199+0000: 8590: debug : qemuMonitorJSON
am": {"total": 67776512, "remaining": 323584, "mbps": 95.288421, "transferred": 82742303, "duplicate": 4631, "dirty-pages-rate": 310, "skipped": 0, "normal-bytes": 83189760, "normal"
: 20310}}, "id": "libvirt-144"}]
2016-01-26 10:30:51.199+0000: 8590: debug : qemuMonitorJSON
What is interesting is that we see some I/O error events issued - the first is
2016-01-26 10:30:46.888+0000: 8590: debug : qemuMonitorJSON IOProcessLine: 172 : QEMU_MONITOR_ RECV_EVENT: mon=0x7f1e0c00db30 event={"timestamp": {"seconds": 1453804246, "microseconds": 883853}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive- virtio- disk0", "operation": "write", "action": "report"}}
Initially migration is still continuing despite these I/O error reports
After we have received quite alot of them though, we eventually see migration fail
2016-01-26 10:30:51.251+0000: 8591: debug : qemuMonitorSend:959 : QEMU_MONITOR_ SEND_MSG: mon=0x7f1e0c00db30 msg={"execute" :"query- migrate" ,"id":" libvirt- 145"} ocess:396 : QEMU_MONITOR_ IO_PROCESS: mon=0x7f1e0c00db30 buf={"return": {"status": "failed"}, "id": "libvirt-145"}
2016-01-26 10:30:51.891+0000: 8590: debug : qemuMonitorIOPr
That last 'query-migrate' command took a surprisingly long time to get a reply - 600ms - in between the command & reply we get a tonne of block I/o errors reported.
What I'm not sure of is why the block I/O errors would cause migration to abort - this shouldn't happen unless libvirt is asked for it, and nova doesn't ask that.