Xen 32bit dom0 on 64bit hypervisor: bad page flags
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
High
|
Unassigned | ||
Wily |
Fix Released
|
High
|
Unassigned | ||
Xenial |
Fix Released
|
High
|
Unassigned | ||
xen (Ubuntu) |
Invalid
|
High
|
Unassigned | ||
Wily |
Invalid
|
Undecided
|
Unassigned | ||
Xenial |
Invalid
|
Undecided
|
Unassigned |
Bug Description
This problem is a mix between running certain versions of 32bit Linux kernel dom0 on certain versions of 64bit Xen hypervisor, combined with certain memory clamping settings (dom0_mem=xM, without setting the max limit).
Xen 4.4.2 + Linux 3.13.x
Xen 4.5.0 + linux 3.19.x
Xen 4.6.0 + linux 4.0.x
Xen 4.6.0 + linux 4.1.x
-> all boot without messages
Xen 4.5.1 + Linux 4.2.x
Xen 4.6.0 + Linux 4.2.x
Xen 4.6.0 + Linux 4.3.x
* dom0_mem 512M, 4096M, or unlimited
-> boot without messages
* dom0_mem between 1024M and 3072M (inclusive)
-> bad page messages (but finishes boot)
Xen 4.6.0 + Linux 4.4.x
Xen 4.6.0 + Linux 4.5.x
Xen 4.6.0 + Linux 4.6-rc6
The boot for 512M,4096M, and unlimited looks good as well. Though trying to
start a domU without dom0_mem set caused a crash when ballooning (but I
think this should be a seperate bug)
Using a dom0_mem range between 1G and 3G it looks like still producing the
bad page flags bug message and additionally panicking + reboot.
The bad page bug generally looks like this (the pfn numbers seem to be towards the end of the allocated range.
[ 8.980150] BUG: Bad page state in process swapper/0 pfn:7fc22
[ 8.980238] page:f4566550 count:0 mapcount:0 mapping: (null) index:0x0
[ 8.980328] flags: 0x7000400(reserved)
[ 8.980486] page dumped because: PAGE_FLAGS_
[ 8.980575] bad because of flags:
[ 8.980688] flags: 0x400(reserved)
[ 8.980844] Modules linked in:
[ 8.980960] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B 4.2.0-19-
generic #23-Ubuntu
[ 8.981084] Hardware name: Supermicro H8SGL/H8SGL, BIOS 3.0 08/31/2012
[ 8.981177] c1a649a7 23e07668 00000000 e9cafce4 c175e501 f4566550 e9cafd08 c
1166897
[ 8.981608] c19750a4 e9d183ec 0007fc22 007fffff c1975630 c1978e86 00000001 e
9cafd74
[ 8.982074] c1169f83 00000002 00000141 0004a872 c1af3644 00000000 ee44bce4 e
e44bce4
[ 8.982506] Call Trace:
[ 8.982582] [<c175e501>] dump_stack+
[ 8.982666] [<c1166897>] bad_page+0xb7/0x110
[ 8.982749] [<c1169f83>] get_page_
[ 8.982838] [<c116a4f3>] __alloc_
[ 8.982926] [<c122ee62>] ? find_entry.
[ 8.983013] [<c11b0f75>] ? kmem_cache_
[ 8.983102] [<c10b1c96>] ? __raw_callee_
[ 8.983223] [<c11b0ddd>] ? __kmalloc+
[ 8.983308] [<c119cc2e>] __vmalloc_
[ 8.983433] [<c1148fa7>] ? bpf_prog_
[ 8.983518] [<c119cd96>] __vmalloc_
[ 8.983604] [<c1148fa7>] ? bpf_prog_
[ 8.983689] [<c119cdd4>] __vmalloc+0x34/0x40
[ 8.983773] [<c1148fa7>] ? bpf_prog_
[ 8.983859] [<c1148fa7>] bpf_prog_
[ 8.983944] [<c167cc8c>] bpf_prog_
[ 8.984034] [<c1b6741e>] ? bsp_pm_
[ 8.984121] [<c1b68401>] ptp_classifier_
[ 8.984207] [<c1b6749a>] sock_init+0x7c/0x83
[ 8.984291] [<c100211a>] do_one_
[ 8.984376] [<c1b6741e>] ? bsp_pm_
[ 8.984463] [<c1b1654c>] ? repair_
[ 8.984551] [<c1b16cf6>] ? kernel_
[ 8.984726] [<c1755fb0>] kernel_
[ 8.984846] [<c10929b1>] ? schedule_
[ 8.984932] [<c1764141>] ret_from_
[ 8.985019] [<c1755fa0>] ? rest_init+0x70/0x70
break-fix: 92923ca3aacef63
description: | updated |
tags: | added: patch |
description: | updated |
Changed in xen (Ubuntu Wily): | |
status: | New → Invalid |
Changed in xen (Ubuntu Xenial): | |
status: | New → Invalid |
tags: | added: kernel-bug-break-fix |
Changed in linux (Ubuntu Wily): | |
status: | Fix Released → Confirmed |
Changed in linux (Ubuntu Xenial): | |
status: | Fix Released → Confirmed |
Changed in linux (Ubuntu): | |
status: | Fix Released → Confirmed |
description: | updated |
Changed in linux (Ubuntu Wily): | |
status: | Confirmed → Fix Released |
Changed in linux (Ubuntu Xenial): | |
status: | Confirmed → Fix Released |
I marked this as affecting both Xen and the Linux kernel because of the interaction between both. Not even sure the panic/reboot is exactly due to the bad page problem or something else. On the other hand, having the dom0_mem at certain values avoids that panic, so maybe its just leading to more serious issues with newer kernels.