Ubuntu
qemu-kvm package

Comment 10 for bug 1730717

Revision history for this message

Iain Lane (laney) wrote on 2017-11-08: Re: [Bug 1730717] Re: Some VMs fail to reboot with "watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]"

#10

On Wed, Nov 08, 2017 at 11:56:02AM -0000, ChristianEhrhardt wrote:
> Torkoal (our Jenkins node) was idle atm and Ryan reported he had seen the issues there before, so trying there as well.
> This is LTS + HWE - Kernel 4.10.0-38-generic, qemu: 1:2.5+dfsg-5ubuntu10
>
> I thought about your case since you seem just to start a lot of them and reboot,
> this shouldn't be so much different to:
> $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=artful
> $ for i in {1..30}; do uvt-kvm create --log-console-output --password=ubuntu artful-${i}-bug1730717 release=artful arch=amd64 label=daily; done
> $ for i in {1..30}; do uvt-kvm wait --insecure artful-${i}-bug1730717; done
> $ for i in {1..30}; do uvt-kvm ssh --insecure artful-${i}-bug1730717 "sudo reboot"; done
> $ sudo grep "soft lockup" /var/log/libvirt/qemu/artful-*-bug1730717.log

Sounds like it's similar, but maybe you have to put the system under
load - you might need more instances, or maybe start a whole bunch first
and get them to run something memory intensive before running that same
test again. In the cloud there will be buildds and tests running on the
compute nodes too, as well as these 'empty' instances that I use to
reproduce the problem.

> Waiting for your feedback if you can trigger the same issue on a non-
> busy openstack system (could after all be some openstack magic at work
> that makes it behave differently).

I don't have access to a non busy cloud I'm afraid.

ANYWAY! My results are in. I created an image by booting the stock
artful cloud image and installing the mainline kernel v4.14-rc8
(39dae59d66acd86d1de24294bd2f343fd5e7a625) packages, on lcy01 (the busy
cloud that exhibits this problem).

I started 34 (17 × 2 in two runs - that's all I could squeeze in before
I hit my quota) instances, and they were all good. This isn't definitive
proof, but it looks like that kernel might be good.

Cheers,

--
Iain Lane [ <email address hidden> ]
Debian Developer [ <email address hidden> ]
Ubuntu Developer [ <email address hidden> ]

On Wed, Nov 08, 2017 at 11:56:02AM -0000, ChristianEhrhardt wrote:
> Torkoal (our Jenkins node) was idle atm and Ryan reported he had seen the issues there before, so trying there as well.
> This is LTS + HWE - Kernel 4.10.0-38-generic, qemu: 1:2.5+dfsg-5ubuntu10
> 
> I thought about your case since you seem just to start a lot of them and reboot,
> this shouldn't be so much different to:
> $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=artful
> $ for i in {1..30}; do uvt-kvm create --log-console-output --password=ubuntu artful-${i}-bug1730717 release=artful arch=amd64 label=daily; done
> $ for i in {1..30}; do uvt-kvm wait --insecure artful-${i}-bug1730717; done
> $ for i in {1..30}; do uvt-kvm ssh --insecure artful-${i}-bug1730717 "sudo reboot"; done
> $ sudo grep "soft lockup" /var/log/libvirt/qemu/artful-*-bug1730717.log