libvirt: Post-migration, set cache value for Cinder volume(s)
This was noticed in a downstream bug when a Nova instance with Cinder
volume (in this case, both the Nova instance storage _and_ Cinder volume
are located on Ceph) is migrated to a target Compute node, the disk
cache value for the Cinder volume gets changed. I.e. the QEMU
command-line for the Cinder volume stored on Ceph turns into the
following:
Pre-migration, QEMU command-line for the Nova instance:
This change in cache value post-migration causes I/O latency on the
Cinder volume.
From a chat with Daniel Berrangé on IRC: Prior to live migration, Nova
rewrites all the <disk> elements, and passes this updated guest XML
across to target libvirt. And it is never calling _set_cache_mode()
when doing this. So `nova.conf`'s `writeback` setting is getting lost,
leaving the default `cache=none` setting. And this mistake (of leaving
the default cache value to 'none') will of course be correct when you
reboot the guest on the target later.
So:
- Call _set_cache_mode() in _get_volume_config() method -- because it
is a callback function to _update_volume_xml() in nova/virt/libvirt/migration.py.
- And remove duplicate calls to _set_cache_mode() in _get_guest_storage_config() and attach_volume().
- Fix broken unit tests; adjust test_get_volume_config() to reflect
the disk cache mode.
Thanks: Jason Dillaman of Ceph for observing the change in cache modes
in a downstream bug analysis, Daniel Berrangé for help in analysis from a Nova libvirt driver POV, and Stefan Hajnoczi
from QEMU for help on I/O latency instrumentation with `perf`.
Conflicts [stable/ocata]:
- libvirt/driver.py: The _get_scsi_controller() method from Git master
isn't in Ocata, so adjust the _get_guest_storage_config() method
accordingly.
- Fix unit test conflicts in the method test_attach_volume_with_vir_domain_affect_live_flag().
Closes-bug: 1706083
Change-Id: I4184382b49dd2193d6a21bfe02ea973d02d8b09f
(cherry picked from commit 14c38ac0f253036da79f9d07aedf7dfd5778fde8)
Reviewed: https:/ /review. openstack. org/489198 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=306b3f1aac4 fc30ee14a7f5f66 474d68860768fa
Committed: https:/
Submitter: Jenkins
Branch: stable/ocata
commit 306b3f1aac4fc30 ee14a7f5f66474d 68860768fa
Author: Kashyap Chamarthy <email address hidden>
Date: Thu Jul 20 19:01:23 2017 +0200
libvirt: Post-migration, set cache value for Cinder volume(s)
This was noticed in a downstream bug when a Nova instance with Cinder
volume (in this case, both the Nova instance storage _and_ Cinder volume
are located on Ceph) is migrated to a target Compute node, the disk
cache value for the Cinder volume gets changed. I.e. the QEMU
command-line for the Cinder volume stored on Ceph turns into the
following:
Pre-migration, QEMU command-line for the Nova instance:
[...] -drive file=rbd: volumes/ volume- [...],cache= writeback
Post-migration, QEMU command-line for the Nova instance:
[...] -drive file=rbd: volumes/ volume- [...],cache= none
Furthermore, Jason Dillaman from Ceph confirms RBD cache being enabled
pre-migration:
$ ceph --admin-daemon /var/run/ qemu/ceph- client. openstack. [...] \
"rbd_ cache": "true"
config get rbd_cache
{
}
And disabled, post-migration:
$ ceph --admin-daemon /var/run/ qemu/ceph- client. openstack. [...] \
"rbd_ cache": "false"
config get rbd_cache
{
}
This change in cache value post-migration causes I/O latency on the
Cinder volume.
From a chat with Daniel Berrangé on IRC: Prior to live migration, Nova
rewrites all the <disk> elements, and passes this updated guest XML
across to target libvirt. And it is never calling _set_cache_mode()
when doing this. So `nova.conf`'s `writeback` setting is getting lost,
leaving the default `cache=none` setting. And this mistake (of leaving
the default cache value to 'none') will of course be correct when you
reboot the guest on the target later.
So:
- Call _set_cache_mode() in _get_volume_ config( ) method -- because it volume_ xml() in
nova/virt/ libvirt/ migration. py.
is a callback function to _update_
- And remove duplicate calls to _set_cache_mode() in
_get_guest_ storage_ config( ) and attach_volume().
- Fix broken unit tests; adjust test_get_ volume_ config( ) to reflect
the disk cache mode.
Thanks: Jason Dillaman of Ceph for observing the change in cache modes
analysis from a Nova libvirt driver POV, and Stefan Hajnoczi
in a downstream bug analysis, Daniel Berrangé for help in
from QEMU for help on I/O latency instrumentation with `perf`.
Conflicts [stable/ocata]: controller( ) method from Git master storage_ config( ) method
test_attach_ volume_ with_vir_ domain_ affect_ live_flag( ).
- libvirt/driver.py: The _get_scsi_
isn't in Ocata, so adjust the _get_guest_
accordingly.
- Fix unit test conflicts in the method
Closes-bug: 1706083 93d6a21bfe02ea9 73d02d8b09f da79f9d07aedf7d fd5778fde8)
Change-Id: I4184382b49dd21
(cherry picked from commit 14c38ac0f253036