No live migration on aarch64

Bug #1756118 reported by Marcin Juszkiewicz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Eric Xie
Queens
Triaged
Medium
Unassigned

Bug Description

On AArch64 we use cpu_mode='host-passthrought' which should allow us to do live migration between compute nodes running same cpus.

But it does not:

-----------------------------------------------------------------------------estuary@ref-compute-2:~$ openstack server migrate --live ref-compute-1 --wait 38da6986-2f76-486d-877b-438560d7aa05
Migration pre-check error: CPU doesn't have compatibility.

XML error: Missing CPU model name

Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult (HTTP 400) (Request-ID: req-cea6f860-e78b-469b-9527-a703114b372b)
-----------------------------------------------------------------------------

virsh itself is able to live migrate between two XGene1 nodes:

-----------------------------------------------------------------------------
root@cb-r1-m1-c1n1:/var/log/libvirt# virsh migrate --copy-storage-all --live debian-cloud-image qemu+ssh://root@10.101.3.103/system tcp://10.101.3.103

2018-03-15 15:15:23.217+0000: initiating migration
2018-03-15 15:15:25.740+0000: shutting down, reason=migrated
2018-03-15T15:15:25.741113Z qemu-system-aarch64: terminating on signal 15 from pid 573 (/usr/sbin/libvirtd)

root@debian:/var/lib/libvirt/images# virsh list --all
 Id Name State
----------------------------------------------------
 9 debian-cloud-image running
-----------------------------------------------------------------------------

Will dig more to find out what, where and why is wrong.

More info: https://bugzilla.redhat.com/show_bug.cgi?id=1430987

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Wishlist
tags: added: libvirt
Marcin Juszkiewicz (hrw)
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

This isn't a 'bug' since it was never supported before anyway.

Revision history for this message
Marcin Juszkiewicz (hrw) wrote :

For us it is a regression as on Newton with libvirt 2.2.10 live migration from nova level on aarch64 worked fine.

On queens we have libvirt 3.8/3.10 and it does not on nova level but does on libvirt level.

Revision history for this message
Matt Riedemann (mriedem) wrote :

OK removed the 'wishlist' tag if this used to work. FWIW nothing obvious jumps out at me for recent changes in the libvirt driver for the 'compare_cpu' code.

Changed in nova:
importance: Wishlist → Undecided
tags: added: aarch64 live-migration
Revision history for this message
melanie witt (melwitt) wrote :

Okay, according to the BZ comment [0], it's expected that libvirt doesn't know how to detect the CPU model on AArch64. And in the past, it was wrongly being reported by libvirt, which was a bug in libvirt.

So, it seems since they fixed the bug, it breaks in nova now (because nova expects CPU model to be in the XML). When it was erroneously there, things worked. Now that it's gone, things don't work.

To fix it, I think we need to add something like this at [1]:

  # Libvirt doesn't know how to detect the host CPU model on AArch64.
  if cpu.arch == fields.Architecture.AARCH64:
      LOG.debug('Libvirt doesn't know how to detect the host CPU model on AArch64. Skipping CPU comparison')
      return

We need to double-check with someone like kashyap to confirm whether there's a more correct way to handle it other than bypassing it altogether.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1430987#c2
[1] https://github.com/openstack/nova/blob/8b6b65f/nova/virt/libvirt/driver.py#L6761

Revision history for this message
melanie witt (melwitt) wrote :

Marking it as High because it's a regression in Queens.

Changed in nova:
importance: Undecided → High
status: New → Triaged
Kevin Zhao (kevin-zhao)
Changed in nova:
assignee: nobody → Kevin Zhao (kevin-zhao)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/589769

Changed in nova:
status: Triaged → In Progress
Revision history for this message
melanie witt (melwitt) wrote :

Lowering this to Medium as it's been regressed since Queens.

Changed in nova:
importance: High → Medium
Kevin Zhao (kevin-zhao)
Changed in nova:
assignee: Kevin Zhao (kevin-zhao) → nobody
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

It is true that libvirt does not know how to detect host CPU model on AArch64, but even if it _wants_ to know, it cannot, because even the `/proc/cpuinfo` on AArch64 doesn't show anything interesting. There are lots of vendors making different AArch64 CPUs, and they are not easily comparable. They all differ in various ways. (This is also confirmed by Jiri Denemark of libvirt.)

So it doesn't make sense to do CPU compatibility check on AArch64. And the AArch64 folks themselves recommend that the way to run KVM guests on AArch64 is via 'host-passthrough'.

So with that in mind, I'd suggest the error to be something like:

  "Host CPU compatibility check does not make
   sense on AArch64; skip CPU comparison"

Revision history for this message
Marcin Juszkiewicz (hrw) wrote :

In theory those fields from /proc/cpuinfo could be used to at least know are we trying to migrate to same CPU:

X-Gene 1:

CPU implementer : 0x50
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 1

ThunderX:

CPU implementer : 0x43
CPU architecture: 8
CPU variant : 0x1
CPU part : 0x0a1
CPU revision : 0

ThunderX2:

CPU implementer : 0x43
CPU architecture: 8
CPU variant : 0x1
CPU part : 0x0af
CPU revision : 1

HiSilicon D05:

CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part : 0xd08
CPU revision : 2

Eric Xie (eric-xie)
Changed in nova:
assignee: nobody → Eric Xie (eric-xie)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/589769
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4bb54ae86981d186e41dcc325d171ab951beb7b6
Submitter: Zuul
Branch: master

commit 4bb54ae86981d186e41dcc325d171ab951beb7b6
Author: Kevin Zhao <email address hidden>
Date: Wed Aug 8 17:04:52 2018 +0800

    Skip cpu comparison on AArch64

    Host CPU compatibility check does not make sense on AArch64,
    this patch skips CPU comparison.

    Closes-bug: #1756118

    Change-Id: I0ef4b954b7f4ae65b6c0f96580c5f9472a2b873c
    Signed-off-by: Kevin Zhao <email address hidden>

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/699115

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/699116

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/699117

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/rocky)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/699117

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/stein)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/699116

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by Lee Yarwood (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/699115

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.