libvirt: Post-migration, set cache value for Cinder volume(s)
This was noticed in a downstream bug when a Nova instance with Cinder
volume (in this case, both the Nova instance storage _and_ Cinder volume
are located on Ceph) is migrated to a target Compute node, the disk
cache value for the Cinder volume gets changed. I.e. the QEMU
command-line for the Cinder volume stored on Ceph turns into the
following:
Pre-migration, QEMU command-line for the Nova instance:
This change in cache value post-migration causes I/O latency on the
Cinder volume.
From a chat with Daniel Berrangé on IRC: Prior to live migration, Nova
rewrites all the <disk> elements, and passes this updated guest XML
across to target libvirt. And it is never calling _set_cache_mode()
when doing this. So `nova.conf`'s `writeback` setting is getting lost,
leaving the default `cache=none` setting. And this mistake (of leaving
the default cache value to 'none') will of course be correct when you
reboot the guest on the target later.
So:
- Call _set_cache_mode() in _get_volume_config() method -- because it
is a callback function to _update_volume_xml() in nova/virt/libvirt/migration.py.
- And remove duplicate calls to _set_cache_mode() in _get_guest_storage_config() and attach_volume().
- Fix broken unit tests; adjust test_get_volume_config() to reflect
the disk cache mode.
Thanks: Jason Dillaman of Ceph for observing the change in cache modes
in a downstream bug analysis, Daniel Berrangé for help in analysis from a Nova libvirt driver POV, and Stefan Hajnoczi
from QEMU for help on I/O latency instrumentation with `perf`.
Reviewed: https:/ /review. openstack. org/485752 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=14c38ac0f25 3036da79f9d07ae df7dfd5778fde8
Committed: https:/
Submitter: Jenkins
Branch: master
commit 14c38ac0f253036 da79f9d07aedf7d fd5778fde8
Author: Kashyap Chamarthy <email address hidden>
Date: Thu Jul 20 19:01:23 2017 +0200
libvirt: Post-migration, set cache value for Cinder volume(s)
This was noticed in a downstream bug when a Nova instance with Cinder
volume (in this case, both the Nova instance storage _and_ Cinder volume
are located on Ceph) is migrated to a target Compute node, the disk
cache value for the Cinder volume gets changed. I.e. the QEMU
command-line for the Cinder volume stored on Ceph turns into the
following:
Pre-migration, QEMU command-line for the Nova instance:
[...] -drive file=rbd: volumes/ volume- [...],cache= writeback
Post-migration, QEMU command-line for the Nova instance:
[...] -drive file=rbd: volumes/ volume- [...],cache= none
Furthermore, Jason Dillaman from Ceph confirms RBD cache being enabled
pre-migration:
$ ceph --admin-daemon /var/run/ qemu/ceph- client. openstack. [...] \
"rbd_ cache": "true"
config get rbd_cache
{
}
And disabled, post-migration:
$ ceph --admin-daemon /var/run/ qemu/ceph- client. openstack. [...] \
"rbd_ cache": "false"
config get rbd_cache
{
}
This change in cache value post-migration causes I/O latency on the
Cinder volume.
From a chat with Daniel Berrangé on IRC: Prior to live migration, Nova
rewrites all the <disk> elements, and passes this updated guest XML
across to target libvirt. And it is never calling _set_cache_mode()
when doing this. So `nova.conf`'s `writeback` setting is getting lost,
leaving the default `cache=none` setting. And this mistake (of leaving
the default cache value to 'none') will of course be correct when you
reboot the guest on the target later.
So:
- Call _set_cache_mode() in _get_volume_ config( ) method -- because it volume_ xml() in
nova/virt/ libvirt/ migration. py.
is a callback function to _update_
- And remove duplicate calls to _set_cache_mode() in
_get_guest_ storage_ config( ) and attach_volume().
- Fix broken unit tests; adjust test_get_ volume_ config( ) to reflect
the disk cache mode.
Thanks: Jason Dillaman of Ceph for observing the change in cache modes
analysis from a Nova libvirt driver POV, and Stefan Hajnoczi
in a downstream bug analysis, Daniel Berrangé for help in
from QEMU for help on I/O latency instrumentation with `perf`.
Closes-bug: 1706083 93d6a21bfe02ea9 73d02d8b09f
Change-Id: I4184382b49dd21