Ubuntu
qemu package

qemu segfaults after re-attaching ceph volume to instance

Xenial (16.04)
Bug #1763649

Bug #1763649 reported by Crazik on 2018-04-13

16

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Ubuntu Cloud Archive	New	Undecided	Unassigned
	qemu (Ubuntu)	Fix Released	Undecided	Unassigned
	Xenial	Incomplete	Undecided	Unassigned
	Artful	Incomplete	Undecided	Unassigned

Bug Description

I have OpenStack compute nodes with qemu-system-x86. Using Ceph as storage backend for base disks and volumes (no local storage).

When I create a new volume on ceph and attach to instance - it's working.
When I detach volume, and re-attach again, with limited number of repeats I am able to crash my instance. Sometimes it's just in second try, sometimes 6, 9. In most cases it won't survive 10 cycles.

Steps to reproduce:

- create instance
- create volume in ceph

define volume in disk.xml: http://paste.openstack.org/show/719130/

now try a loop:

while true; do
  virsh attach-device instance-0xxx disk.xml;
  sleep 5;
  virsh detach-disk instance-000022e8 vdb --live;
  sleep 5;
done

After few iterations, instance is crashed.

Logs:

kernel: [3866704.245319] traps: qemu-system-x86[23382] general protection ip:558690860750 sp:7faaf36f6ea8 error:0 in qemu-system-x86_64[5586902a7000+842000]

or

kernel: [7252748.718834] qemu-system-x86[30720]: segfault at 100 ip 000056258ba78144 sp 00007fca010c1eb0 error 4 in qemu-system-x86_64[56258b47a000+842000]

Ubuntu Xenial 16.04.3 with cloud-archive@Ocata repositories
kernel: 4.4.0-109-generic
qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud1
libvirt-bin 2.5.0-3ubuntu5.6~cloud0
ceph/rados: 10.2.10-1xenial

See original description

Tags:

Crazik (crazik) on 2018-04-13

description:

updated

Crazik (crazik) on 2018-04-13

description:

updated

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-04-17:

#1

@Corey / James - I have no ceph around at all, also this is reported against a cloud-archive qemu (Ocata if I read it correctly).

Can you confirm this issue and if so are there further insights how to handle it further?

@Crazik - to what extend could you try on your existing setup with different qemu&libvirt versions like those of Ubuntu Cloud Archive Pike (2.10) and Queens (2.11) from [1] ?
If you can it might be worth to update the storage node (ceph) independently to the compute node qemu/libvirt - that way more easily we might get a feeling in which area a potential fix might be.

[1]: https://wiki.ubuntu.com/OpenStack/CloudArchive

Revision history for this message

Crazik (crazik) wrote on 2018-05-15:

#2

Problem was solved by upgrade to Queens.
Looks like it was caused by ceph/rados libs w/ qemu issues.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-05-15:

#3

Thanks Crazik for reporting that, at least that means newer versions on Bionic/Queens are good.

Changed in qemu (Ubuntu):
status:	New → Fix Released
Changed in qemu (Ubuntu Xenial):
status:	New → Incomplete
Changed in qemu (Ubuntu Artful):
status:	New → Incomplete

Revision history for this message

Crazik (crazik) wrote on 2018-05-15:

#4

Well, still Xenial with cloud archive repos for queens.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-05-15:

#5

So it is not fixed by your upgrade to queens as you first thought?
=> Ubuntu 16.04 + UCA-Queens fails other than stated in comment #2?

We these different versions here to consider:
- Xenial as-is (no report on it yet)
- Artful as-is (no report on it yet)
- Xenial-Ocata (initial report, fails)
- Xenial-Queens (comment #2, reported good)

Is the summary above correct?

Revision history for this message

Crazik (crazik) wrote on 2018-05-15:

#6

I was confused by your "versions on Bionic/Queens are good" statement.

Openstack was upgraded to Queens, it's correct, I am using cloud archive repositories for OpenStack packages, while base system is still based on Xenial.

So final summary is correct,

Revision history for this message

Gaudenz Steinlin (gaudenz-debian) wrote on 2018-08-31:

#7

I can confirm that upgrading from Xenial-Ocata to Xenial-Pike (both Cloud Archive) solves the issue.

Revision history for this message

Gaudenz Steinlin (gaudenz-debian) wrote on 2018-08-31:

#8

Would it be possible to get a fixed QEMU version into the Xenial-Ocata cloud archive?

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.