Comment 6 for bug 1608446

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

So Anna Babich managed to reproduce this on another environment and this looks interesting.

As Georgy pointed out, the corresponding apparmor profile file is empty after reboot:

http://paste.openstack.org/show/553601/

Pay attention to the file modification time, which says "12.47". The error on start of the VM is the same:

http://paste.openstack.org/raw/553602/

However, if you look at the kern.log, you'll see that the compute node was not up until 12.48:

http://paste.openstack.org/show/553609/

which effectively means the apparmor profile file was created *before* the node was rebooted.

Note there is a number of "^@" symbols in the kern.log - this is how less util displays 0 bytes.

What happens here is that a node is stopped abruptly and the FS buffers are not properly flushed to disks, which basically means some data may get corrupted or be missing after the node gets back online.

This is also confirmed by the nova-compute.log:

http://paste.openstack.org/show/553604/
http://paste.openstack.org/show/553610/ (note ^@)

^ there are no log entries that instance 88d7b107-a956-4949-9b2f-5166684f3fe5 was ever *created* on this node, only that nova tried to resume it after restart of the node.

I think what happens here is that you try to boot a VM in the automated test case, then you stop the node abruptly before libvirt flushes FS cache buffers to disks and that's how the apparmor profile file happens to be empty on the node start, which ultimately fails the VM start.

While we could probably "harden" libvirt to overwrite the profile on domain start, I'd argue that you should modify the test case to do a graceful restart instead (shutdown -r), so that all the processes receive TERM signals and get a chance to flush the FS buffers and do other kind of cleanups before termination.