On Wed, Nov 08, 2017 at 11:56:02AM -0000, ChristianEhrhardt wrote:
> Torkoal (our Jenkins node) was idle atm and Ryan reported he had seen the issues there before, so trying there as well.
> This is LTS + HWE - Kernel 4.10.0-38-generic, qemu: 1:2.5+dfsg-5ubuntu10
>
> I thought about your case since you seem just to start a lot of them and reboot,
> this shouldn't be so much different to:
> $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 label=daily release=artful
> $ for i in {1..30}; do uvt-kvm create --log-console-output --password=ubuntu artful-${i}-bug1730717 release=artful arch=amd64 label=daily; done
> $ for i in {1..30}; do uvt-kvm wait --insecure artful-${i}-bug1730717; done
> $ for i in {1..30}; do uvt-kvm ssh --insecure artful-${i}-bug1730717 "sudo reboot"; done
> $ sudo grep "soft lockup" /var/log/libvirt/qemu/artful-*-bug1730717.log
Sounds like it's similar, but maybe you have to put the system under
load - you might need more instances, or maybe start a whole bunch first
and get them to run something memory intensive before running that same
test again. In the cloud there will be buildds and tests running on the
compute nodes too, as well as these 'empty' instances that I use to
reproduce the problem.
> Waiting for your feedback if you can trigger the same issue on a non-
> busy openstack system (could after all be some openstack magic at work
> that makes it behave differently).
I don't have access to a non busy cloud I'm afraid.
ANYWAY! My results are in. I created an image by booting the stock
artful cloud image and installing the mainline kernel v4.14-rc8
(39dae59d66acd86d1de24294bd2f343fd5e7a625) packages, on lcy01 (the busy
cloud that exhibits this problem).
I started 34 (17 × 2 in two runs - that's all I could squeeze in before
I hit my quota) instances, and they were all good. This isn't definitive
proof, but it looks like that kernel might be good.
On Wed, Nov 08, 2017 at 11:56:02AM -0000, ChristianEhrhardt wrote: 5ubuntu10 ms-libvirt --verbose sync --source http:// cloud-images. ubuntu. com/daily arch=amd64 label=daily release=artful output --password=ubuntu artful-${i}-bug1730717 release=artful arch=amd64 label=daily; done libvirt/ qemu/artful- *-bug1730717.log
> Torkoal (our Jenkins node) was idle atm and Ryan reported he had seen the issues there before, so trying there as well.
> This is LTS + HWE - Kernel 4.10.0-38-generic, qemu: 1:2.5+dfsg-
>
> I thought about your case since you seem just to start a lot of them and reboot,
> this shouldn't be so much different to:
> $ uvt-simplestrea
> $ for i in {1..30}; do uvt-kvm create --log-console-
> $ for i in {1..30}; do uvt-kvm wait --insecure artful-${i}-bug1730717; done
> $ for i in {1..30}; do uvt-kvm ssh --insecure artful-${i}-bug1730717 "sudo reboot"; done
> $ sudo grep "soft lockup" /var/log/
Sounds like it's similar, but maybe you have to put the system under
load - you might need more instances, or maybe start a whole bunch first
and get them to run something memory intensive before running that same
test again. In the cloud there will be buildds and tests running on the
compute nodes too, as well as these 'empty' instances that I use to
reproduce the problem.
> Waiting for your feedback if you can trigger the same issue on a non-
> busy openstack system (could after all be some openstack magic at work
> that makes it behave differently).
I don't have access to a non busy cloud I'm afraid.
ANYWAY! My results are in. I created an image by booting the stock 6d1de24294bd2f3 43fd5e7a625) packages, on lcy01 (the busy
artful cloud image and installing the mainline kernel v4.14-rc8
(39dae59d66acd8
cloud that exhibits this problem).
I started 34 (17 × 2 in two runs - that's all I could squeeze in before
I hit my quota) instances, and they were all good. This isn't definitive
proof, but it looks like that kernel might be good.
Cheers,
--
Iain Lane [ <email address hidden> ]
Debian Developer [ <email address hidden> ]
Ubuntu Developer [ <email address hidden> ]