Inconsistent nested KVM status with race conditions across multiple hosts: rmmod: ERROR: Module kvm_intel is in use

Bug #1853465 reported by Nobuto Murata
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Incomplete
Undecided
Unassigned

Bug Description

Ubuntu bionic
qemu-system-x86: 1:2.11+dfsg-1ubuntu7.20

When installing qemu-system-x86, nested KVM will be enabled by default thanks to the file offered by the package:

/etc/modprobe.d/qemu-system-x86.conf:options kvm_intel nested=1

and postinst (/var/lib/dpkg/info/qemu-system-x86.postinst):

# If the host had already installed kvm_intel.ko without nested=1, then
# re-load it now, honoring whatever is in qemu-system-x86.modprobe
if [ "$1" = configure ] ; then
        INTEL_NESTED=/sys/module/kvm_intel/parameters/nested
        if grep -q kvm_intel /proc/modules && [ -f $INTEL_NESTED ]; then
                v=`cat $INTEL_NESTED`
                if [ "x$v" != "xY" ]; then
                        rmmod kvm_intel && modprobe kvm_intel || true
                fi
        fi
fi

However, we found that some of the hosts out of 10+ had nested KVM disabled after the package installation somehow. Then found the error "rmmod: ERROR: Module kvm_intel is in use" during the phase.

2019-11-18 17:29:55 DEBUG install Setting up qemu-system-x86 (1:2.11+dfsg-1ubuntu7.20) ...
2019-11-18 17:29:55 DEBUG install rmmod: ERROR: Module kvm_intel is in use
2019-11-18 17:29:55 DEBUG install Setting up qemu-kvm (1:2.11+dfsg-1ubuntu7.20) ...
2019-11-18 17:29:55 DEBUG install Setting up libpangocairo-1.0-0:amd64 (1.40.14-1ubuntu0.1) ...

By running `rmmod kvm_intel && modprobe kvm_intel` by hand after that, nested KVM got enabled properly. So there should be some sort of race conditions during the installation.

FWIW, at the same timing of the rmmod failure, the kernel seemed running the L1TF test as follows:

Nov 18 17:29:52 host kernel: [ 347.125789] audit: type=1400 audit(1574098192.232:25): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/libvirtd//
qemu_bridge_helper" pid=31696 comm="apparmor_parser"
Nov 18 17:29:55 host kernel: [ 350.078487] ip6_tables: (C) 2000-2006 Netfilter Core Team
Nov 18 17:29:55 host kernel: [ 350.209707] Ebtables v2.0 registered
Nov 18 17:29:55 host kernel: [ 350.464461] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln
/l1tf.html for details.
Nov 18 17:29:56 host kernel: [ 351.438393] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.

My goal here is to have the consistent status of nested KVM because if it's inconsistent across multiple hosts, live-migration will fail with the following error:

[instance: afd27b8f-30df-4eab-b18a-5c269ce97d06] Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: vmx: libvirtError: operation failed

The command executed during automated installation with Juju nova-compute charm:
Commandline: apt-get --assume-yes --option=Dpkg::Options::=--force-confold install nova-compute genisoimage librbd1 python-six python-psutil xfsprogs nfs-common open-iscsi nova-compute-kvm

# apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:2.11+dfsg-1ubuntu7.20
  Candidate: 1:2.11+dfsg-1ubuntu7.20
  Version table:
 *** 1:2.11+dfsg-1ubuntu7.20 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu bionic-security/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2.11+dfsg-1ubuntu7 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

Tags: cpe-onsite
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.2 KiB)

Hi Nobotu,
thanks for this high quality bug report.

I'm glad that running "rmmod kvm_intel && modprobe kvm_intel" later worked for you.
I was having different thoughts on this separated in the sections below ...

---

The obvious solution that is true for most kind initialization issues would be to reboot after install which also works reliable. I'm only not suggesting this as a real fix I generally dislike reboots :-)

---

Unfortunately the postinst isn't allowed to "wait a while until it resolves" so I'm not going that way. Surely we could do a fast non sleeping loop, but that will not make it reliable which is what you'd want. After all you already know that with your manual reload you already can enable it.

Could you give it a try if a (almost) non sleeping retry will fix your issue "statistically reliable"?
We'd know then a bit more about the size of that race window.
Let me know if you need my help for a custom build with that.

Heads up - this surely won't be an SRU on its own.
For something not really supported I'd not want to trigger everyone to download a new version. But there are enough that we could tuck it along.

---

Please be reminded that strictly speaking and in general nested is "as good as possible but not supported" [1]. But that never stopped us from helping as much as possible.

Things got much better going forward in regard to nested support and I'm confident that it might be supported soon and then also enabled by default in the module which will eliminate all of this potential toggle issue.
That will is available quite some time for AMD (>2.6.32) and be available with kernel >4.20 [2] for Intel as the default. And since then it got even more stable, so for Bionic if using nested consider using the HWE kernels.

---

There is another option which is to stay at 4.15 but override the default.
You could obviously drop in an /etc/modprobe.d/*.conf before kvm_intel is ever loaded.
Even earlier you can set kernel commandline kvm_intel.nested=1 which will flip the default from 0 to 1 and avoid later races.

That is part of controlling your deployment, but might be an option if upgrading to the HWE kernel isn't a valid option for you.

---

I think the best options you have in order are:
1. using the HWE kernel this is available right now and will work right away (I prefer that as you also get plenty of fixes for nested).
2. control your deployment by overriding the default as kernel commandline. That will also make sure that it is 1 right from the beginning.
3. reboot after install, silly but effective and for some people it might be preferred
3. We can try the retry loop approach but I'm not really convinced of it, and after all the SRU team might even like it less and reject it.

Let me know what you think and if #1 or #2 will work for you.
If you insist on trying #4 let me know if you need my support for a test build.

P.S. since >=Cosmic had a recent enough kernel I'll mark it Fix released but add a Bionic task.

[1]: https://git.launchpad.net/ubuntu/+source/qemu/tree/debian/qemu-system-x86.README.Debian?h=ubuntu/bionic-devel
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1e58e5...

Read more...

Changed in qemu (Ubuntu):
status: New → Incomplete
status: Incomplete → Fix Released
Changed in qemu (Ubuntu Bionic):
status: New → Incomplete
Revision history for this message
Nobuto Murata (nobuto) wrote :

Hi Christian,

Thank you for the detailed response. Just to clarify, I'm not pursuing to use nested KVM here actually, but to have a consistent flag across multiple hosts so live-migration of the first level KVM VMs won't fail with:

> [instance: afd27b8f-30df-4eab-b18a-5c269ce97d06] Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: vmx: libvirtError: operation failed

In any case, if newer kernel doesn't need the rmmod trick in postinst and the flag is enabled at boot as you wrote, then tracking down the root cause of the race condition is not the best way to spend our time. So we will take either of #1, #2, #3 you suggested above as the way forward.

I will leave this as Incomplete to let it expired unless other users are willing to test it more. Thanks again for the suggestions!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.