2020-11-09 07:06:57 |
Christian Ehrhardt |
description |
We have several thousands of virtual machines with pc-i440fx-wily machine type. Hypervisors run on ubuntu 16.04 and ubuntu 18.04.
We have several problems when we try to migrate those machines to hypervisors with ubuntu 20.04.
* linux guests migrate OK, but for some weird reason windows guests (with the same XML domain definition) do not. We have the following error:
---
qemu-system-x86_64: Features 0x8000002 unsupported. Allowed features: 0x71000002
qemu-system-x86_64: Failed to load virtio-console:virtio
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-console
---
I tried to investigate this issue and discovered following things:
- missing feature is VIRTIO_F_ANY_LAYOUT for some of virtio devices
- on xenial and bionic VIRTIO_F_ANY_LAYOUT is enabled for pc-i440fx-wily guests, observe:
---
# virsh qemu-monitor-command some-guest --hmp info qtree | grep any_layout
any_layout = true
any_layout = true
any_layout = false
any_layout = true
---
- on focal it is disabled
---
# virsh qemu-monitor-command some-guest2 --hmp info qtree | grep any_layout
any_layout = false
any_layout = true
any_layout = false
any_layout = false
---
I tried (helplessly) to compare source code for bionic and focal branches of qemu. Looks like this block code is included for the pc-i440fx-wily in focal branch and this is where any_layout is disabled:
---
GlobalProperty hw_compat_2_3[] = {
{ "virtio-blk-pci", "any_layout", "off" },
{ "virtio-balloon-pci", "any_layout", "off" },
{ "virtio-serial-pci", "any_layout", "off" },
{ "virtio-9p-pci", "any_layout", "off" },
{ "virtio-rng-pci", "any_layout", "off" },
{ TYPE_PCI_DEVICE, "x-pcie-lnksta-dllla", "off" },
{ "migration", "send-configuration", "off" },
{ "migration", "send-section-footer", "off" },
{ "migration", "store-global-state", "off" },
};
---
* also we have another problem that *might* be linked to broken definition of pc-i440fx-wily. I am not sure so I'll just mention it (maybe it will be obvious for someone familiar with source code that this problem is also due to broken definition of pc-i440fx-wily in focal and hence part of the same issue)
So even if migration bionic → focal succeeds, it's impossible to migrate guest back (focal → bionic). The problem is:
---
operation failed: guest CPU doesn't match specification: extra features: arat
--- |
[Impact]
* History: Xenial's qemu as once released with a machine type that was
very broken. This was later on fixed in bug 1621042 but for
compatibility reasons we need to carry the broken type as well (to e.g.
allow migrations from guests started back then). In bug 1829868 we
realized that and "fixed the type to be as bad as it was originally".
* New Issue: In between Bionic and Focal the qemu code changed (again)
the way compat features are stored and assigned. While forward porting
our delta the wily type became "too non bad" that means it is more
"normal" in comparison to e.g. a proper qemu 2.3/2.4 type but that is
not what we need. We need it to be exactly the same mix&match of
2.3/2.4 features it was from the beginning.
* This bug has identified an issue due to that difference, the fix shall
again get this type in sync.
[Test Case]
* Windows guests baloon driver can be affected by this change of
attributes. So if you have started a windows guest with the wily
machine type on xenial and migrate it to focal it will fail as reported
by the bug opener below. Migrating such a machine is a valid test and
was done on the PPA in comment 17.
* These types carry more than just what failed in that windows guest, to
get the full list of compat attributes comment #12 & #13 show how to
get those from gdb in 4.2 and 2.11 respectively. The list should match
what bionic had (without the fix the one of Focal is different).
[Where problems could occur]
* We are changing a type meant for compatibility with very old machines.
So I'd potential problems in migration (or save/restore) of those very
old guests.
Gladly that type isn't the default for more than 4 years now and
discouraged since like forever - and the changes are isolated to this
type.
Furthermore even if there are guests with that old type out it likely
is on very old xenial systems, but we only change >=Focal to be able to
receive those correctly - yet on >Focal there should be (hopefully)
next to none of these super old machine types.
[Other Info]
* To be clear, we are trying to keep an older and older compat base alive
here. But if possible anyone affected should consider upgrading the
guest machine types whenever there are major host OS upgrades. That
needs a guest restart, so only doable on scheduled downtimes.
https://wiki.ubuntu.com/QemuKVMMigration#Upgrade_machine_type
--- --- ---
We have several thousands of virtual machines with pc-i440fx-wily machine type. Hypervisors run on ubuntu 16.04 and ubuntu 18.04.
We have several problems when we try to migrate those machines to hypervisors with ubuntu 20.04.
* linux guests migrate OK, but for some weird reason windows guests (with the same XML domain definition) do not. We have the following error:
---
qemu-system-x86_64: Features 0x8000002 unsupported. Allowed features: 0x71000002
qemu-system-x86_64: Failed to load virtio-console:virtio
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-console
---
I tried to investigate this issue and discovered following things:
- missing feature is VIRTIO_F_ANY_LAYOUT for some of virtio devices
- on xenial and bionic VIRTIO_F_ANY_LAYOUT is enabled for pc-i440fx-wily guests, observe:
---
# virsh qemu-monitor-command some-guest --hmp info qtree | grep any_layout
any_layout = true
any_layout = true
any_layout = false
any_layout = true
---
- on focal it is disabled
---
# virsh qemu-monitor-command some-guest2 --hmp info qtree | grep any_layout
any_layout = false
any_layout = true
any_layout = false
any_layout = false
---
I tried (helplessly) to compare source code for bionic and focal branches of qemu. Looks like this block code is included for the pc-i440fx-wily in focal branch and this is where any_layout is disabled:
---
GlobalProperty hw_compat_2_3[] = {
{ "virtio-blk-pci", "any_layout", "off" },
{ "virtio-balloon-pci", "any_layout", "off" },
{ "virtio-serial-pci", "any_layout", "off" },
{ "virtio-9p-pci", "any_layout", "off" },
{ "virtio-rng-pci", "any_layout", "off" },
{ TYPE_PCI_DEVICE, "x-pcie-lnksta-dllla", "off" },
{ "migration", "send-configuration", "off" },
{ "migration", "send-section-footer", "off" },
{ "migration", "store-global-state", "off" },
};
---
* also we have another problem that *might* be linked to broken definition of pc-i440fx-wily. I am not sure so I'll just mention it (maybe it will be obvious for someone familiar with source code that this problem is also due to broken definition of pc-i440fx-wily in focal and hence part of the same issue)
So even if migration bionic → focal succeeds, it's impossible to migrate guest back (focal → bionic). The problem is:
---
operation failed: guest CPU doesn't match specification: extra features: arat
--- |
|