xm migrate --live panics hypervisor

Bug #1515145 reported by Volker
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
xen (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have to HP Proliant D380 Gen8 Servers with a shared iSCSI storage in a xen cluster configuration. They run 12.04 amd64 and are fully patched and freshly rebooted.

When i call
 /usr/sbin/xm migrate --live <some-domain-id> <ip-over-other-host>
on one server this "sending" system reboots instantly and the VM only appears in an undefined state on the "receiving" host.

After redirecting the console output of the hypervisor to com1 i was able to retrieve this error message:

(XEN) [2015-11-11 07:50:05] traps.c:3073: GPF (0000): ffff82c48017fa31 -> ffff82c4802070d8
(XEN) [2015-11-11 07:50:15]
(XEN) [2015-11-11 07:50:15] ****************************************
(XEN) [2015-11-11 07:50:15] Panic on CPU 10:
(XEN) [2015-11-11 07:50:15] mm lock held by sh_page_fault__guest_4
(XEN) [2015-11-11 07:50:15] ****************************************
(XEN) [2015-11-11 07:50:15]
(XEN) [2015-11-11 07:50:15] Manual reset required ('noreboot' specified)

After replacing "/boot/xen-4.1-amd64.gz" with the file from debian wheezy package "xen-hypervisor-4.1-amd64_4.1.4-3+deb7u9_amd64.deb" the problem is gone.

Tags: patch
Revision history for this message
Volker (volker-reiss) wrote :
Revision history for this message
Volker (volker-reiss) wrote :
Revision history for this message
Volker (volker-reiss) wrote :

uname -srvm
Linux 3.2.0-94-generic #134-Ubuntu SMP Fri Nov 6 18:16:45 UTC 2015 x86_64

Working hypervisor versions:

(XEN) Xen version 4.1.6.1 (Ubuntu 4.1.6.1-0ubuntu0.12.04.5) (<email address hidden>) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) Wed Mar 11 15:00:28 UTC 2015

(XEN) Xen version 4.1.4 (Debian 4.1.4-3+deb7u9) (<email address hidden>) (gcc version 4.7.2 (Debian 4.7.2-5) ) Sat Oct 31 06:17:33 UTC 2015

Non working hypervisor version:

(XEN) Xen version 4.1.6.1 (Ubuntu 4.1.6.1-0ubuntu0.12.04.6) (<email address hidden>) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) Wed Sep 2 18:06:32 UTC 2015

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xen (Ubuntu):
status: New → Confirmed
Revision history for this message
Philipp Hahn (pmhahn) wrote :

Reverting the following patches makes the crash go away (4.1.6.1-0ubuntu0.12.04.8):
* xsa97-hap-4.1-prereq.patch
* xsa97-hap-4.2-prereq.patch
* xsa97-hap-4.2.patch

Revision history for this message
Philipp Hahn (pmhahn) wrote :

debian/patches/xsa97-hap-4.1-prereq.patch is the combination of
 * commit 301493fb027648db6808b66d1ccf849f524b8422
   x86/mm: dedup the various copies of the shadow lock functions
 * commit eca988de7381e5efe58792dc166611e4523d33b3
   x86/mm: merge the shadow, hap and log-dirty locks into a single paging lock.

but it's missing
 * commit 5bf494a7bf3674f32ebaab1b70b76e5f174812a3
   x86/mm: Make MM locks recursive.

which leads to a recursive locking call to crash the hypervisor.

With the patch applied "xm migrate -c -l $dom $host" works again.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "x86/mm: Make MM locks recursive." seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Philipp Hahn (pmhahn) wrote :

The first patch does not work!

The backport was quiet complicates as that patch is in the middle of
4) eca988d x86/mm: merge the shadow, hap and log-dirty locks into a single paging lock.
3) 5bf494a x86/mm: Make MM locks recursive.
2) 301493f x86/mm: dedup the various copies of the shadow lock functions
1) 3b0bcb8 x86/mm/p2m: Move p2m code in HVMOP_[gs]et_mem_access into p2m.c

2) and 4) are already in debian/patches/xsa97-hap-4.1-prereq.patch, so the patch also contains the parts from eca988d which where skipped as 5bf494a was not back-ported.

This patch also requires a back-port of 3b0bcb890955e1c3e2fde10a026c5d85481e6fb8 (1) as compilation failes otherwise - I'll attach that next - as an alternative it could be enough to '#include "../mm-locks.h"' instead, but I didn't test that.

Revision history for this message
Philipp Hahn (pmhahn) wrote :

Backport 3b0bcb8 x86/mm/p2m: Move p2m code in HVMOP_[gs]et_mem_access into p2m.c

Revision history for this message
Philipp Hahn (pmhahn) wrote :

Backport 3b0bcb8 x86/mm/p2m: Move p2m code in HVMOP_[gs]et_mem_access into p2m.c
needs f488040222f34df97deb2470f14cef7fb9599810 to fix the build on i386.

f488040 Fix 32-bit build after p2m series
3b0bcb8 x86/mm/p2m: Move p2m code in HVMOP_[gs]et_mem_access into p2m.c

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.