Description
===========
This is current master branch (wallaby) of OpenStack.
We seen this regularly, but it's intermittent.
We are seeing nova instances that do not transition to ACTIVE inside five minutes. Investigating this led us to find that libvirtd seems to be going into a tight loop on an instance delete.
When running the Octavia scenario test suite, we occasionally see nova instances fail to become ACTIVE in a timely manner, causing timeouts and failures. In investigating this issue we found the libvirtd log was 136MB.
Most of the file is full of this repeating:
2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:767 : Error on monitor internal error: End of file from qemu monitor
2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:788 : Triggering EOF callback
2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHandleMonitorEOF:301 : Received EOF on 0x7f6278014ca0 'instance-00000001'
2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHandleMonitorEOF:305 : Domain is being destroyed, EOF is expected
Description
===========
This is current master branch (wallaby) of OpenStack.
We seen this regularly, but it's intermittent.
We are seeing nova instances that do not transition to ACTIVE inside five minutes. Investigating this led us to find that libvirtd seems to be going into a tight loop on an instance delete.
The 136MB log is here: https:/ /storage. gra.cloud. ovh.net/ v1/AUTH_ dcaab5e32b234d5 6b626f72581e364 4c/zuul_ opendev_ logs_c77/ 759973/ 3/check/ octavia- v2-dsvm- scenario/ c77fe63/ controller/ logs/libvirt/ libvirtd_ log.txt
The overall job logs are here: /zuul.opendev. org/t/openstack /build/ c77fe63a94ef429 8872ad5f40c5df7 d4/logs
https:/
When running the Octavia scenario test suite, we occasionally see nova instances fail to become ACTIVE in a timely manner, causing timeouts and failures. In investigating this issue we found the libvirtd log was 136MB.
Most of the file is full of this repeating: leMonitorEOF: 301 : Received EOF on 0x7f6278014ca0 'instance-00000001' leMonitorEOF: 305 : Domain is being destroyed, EOF is expected
2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:767 : Error on monitor internal error: End of file from qemu monitor
2020-10-28 23:45:06.330+0000: 20852: debug : qemuMonitorIO:788 : Triggering EOF callback
2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHand
2020-10-28 23:45:06.330+0000: 20852: debug : qemuProcessHand
Here is a snippet for the lead in to the repeated lines: paste.openstack .org/show/ 799559/
http://
It appears to be a tight loop, repeating many times per second.
Eventually it does stop and things seem to go back to normal in nova.
Here is the snippet of the end of the loop in the log: paste.openstack .org/show/ 799560/
http://