Bug #2018356 “virt-customize segfaulting in cs9/wallaby jobs” : Bugs : tripleo

virt-customize segfaulting in cs9/wallaby jobs

Bug #2018356 reported by Cédric Jeanneret
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Wallaby on CS9 jobs are failing whenever they are calling virt-customize. For instance:

https://review.rdoproject.org/zuul/build/83b40b2588134ce693f17281dad8ff8e
https://review.rdoproject.org/zuul/build/d28576c077354ebd9bf609435964a96f
https://review.rdoproject.org/zuul/build/932e38f94b5c47b5a1b9a219aa8a90cb
https://review.rdoproject.org/zuul/build/28dc9e4a737e402b9ce8590213b8c20e

The job reports:
fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "virt-customize -m 4096 --smp 4 -v --run-command 'mkdir -p /etc/ci' --upload /etc/ci/mirror_info.sh:/etc/ci/mirror_info.sh -a $HOME/overcloud-hardened-uefi-full.qcow2 > /home/zuul/modify_image.log 2>&1\n", "delta": "0:01:12.389395", "end": "2023-05-02 21:05:09.721902", "msg": "non-zero return code", "rc": 1, "start": "2023-05-02 21:03:57.332507", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

Digging in the generated log:
[...]
supermin: mounting new root on /root
[ 5.685391] EXT4-fs (sdb): mounting ext2 file system using the ext4 subsystem
supermin: deleting initramfs files
[ 5.702610] EXT4-fs (sdb): mounted filesystem without journal. Quota mode: none.
supermin: chroot
[ 5.818280] init[1]: segfault at 55bac8273000 ip 00007f150d2b9cd7 sp 00007fff564ef8c8 error 6 in libc.so.6[7f150d228000+175000] likely on CPU 3 (core 3, socket 0)
[ 5.820734] Code: 00 00 c5 7d e7 8f 20 20 00 00 c5 7d e7 97 40 20 00 00 c5 7d e7 9f 60 20 00 00 c5 7d e7 a7 00 30 00 00 c5 7d e7 af 20 30 00
 00 <c5> 7d e7 b7 40 30 00 00 c5 7d e7 bf 60 30 00 00 48 83 e[ 5.8207f 80 ff c9
34] Code: 00 00 c5 7d e7 8f 20 20 00 00 c5 7d e7 97 40 20 00 00 c5 7d e7 9f 60 20 00 00 c5 7d e7 a7 00 30 00 00 c5 7d e7 af 20 30 00 00 <c5> 7d e7 b7 40 30 00 00 c5 7d e7 bf 60 30 00 00 48 83 ef 80 ff c9
[ 5.822793] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 5.822903] CPU: 3 PID: 1 Comm: init Not tainted 5.14.0-305.el9.x86_64 #1
[...]

Full log: https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby/83b40b2/logs/undercloud/home/zuul/modify_image.log.txt.gz

Changed in tripleo:
importance: Undecided → Critical
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

Failed job:

guestfs-tools.x86_64 1.50.1-2.el9 @quickstart-centos-appstreams
libguestfs.x86_64 1:1.50.1-3.el9 @quickstart-centos-appstreams
libguestfs-appliance.x86_64 1:1.50.1-3.el9 @quickstart-centos-appstreams
libguestfs-xfs.x86_64 1:1.50.1-3.el9 @quickstart-centos-appstreams
kernel.x86_64 5.14.0-302.el9 @anaconda
kernel.x86_64 5.14.0-305.el9 @quickstart-centos-base

Booted: Linux np0003794374 5.14.0-302.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 20 05:35:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Last success:
guestfs-tools.x86_64 1.50.1-2.el9 @quickstart-centos-appstreams
libguestfs.x86_64 1:1.50.1-3.el9 @quickstart-centos-appstreams
libguestfs-appliance.x86_64 1:1.50.1-3.el9 @quickstart-centos-appstreams
libguestfs-xfs.x86_64 1:1.50.1-3.el9 @quickstart-centos-appstreams
kernel.x86_64 5.14.0-302.el9 @anaconda

Booted: Linux np0003788646 5.14.0-302.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 20 05:35:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

So far, no notable differences...

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

Found out a partial answer:

in the code calling virt-customize[1], we're setting a couple of environment variables:
LIBGUESTFS_BACKEND=direct
LIBGUESTFS_BACKEND_SETTINGS=force_tcg

While passing both on a node on hold, I'm facing the same issue. If I remove the BACKEND_SETTINGS, it passes just fine.

Not sure yet why/what changed, and if we can safely remove that parameter, but that's a first clue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/882138
Committed: https://opendev.org/openstack/tripleo-quickstart-extras/commit/bed8aa011577a09077c00c63f52a5490af7b22f5
Submitter: "Zuul (22348)"
Branch: master

commit bed8aa011577a09077c00c63f52a5490af7b22f5
Author: Cédric Jeanneret <email address hidden>
Date: Wed May 3 13:03:27 2023 +0200

    Remove emulation enforcing

    Lately this is making the Wallaby on CS9 line crumble. After some tests,
    it seems, at least on CS9, we're able to get rid of this option - and
    should, since it's crashing virt-customize.

    Change-Id: I4e3cbe4507cbe7d1471f75cb41af99f84725b3ad
    Closes-Bug: #2018356
    Related-Bug: #1743749

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

It seems it bites back.

Here's the diff related to installed package.
diff success.txt today.txt | grep -Ev '(^[0-9]|openstack|tripleo|python)' | awk '{print $1" "$2" "$3}'
< ansible-config_template.noarch 2.0.1-0.20230328103949.7951228.el9
---
> ansible-config_template.noarch 2.0.1-0.20230504231039.8d14af7.el9
< apr.x86_64 1.7.0-11.el9
< apr-util.x86_64 1.6.1-20.el9
< apr-util-bdb.x86_64 1.6.1-20.el9
< apr-util-openssl.x86_64 1.6.1-20.el9
< centos-logos-httpd.noarch 90.4-1.el9
< certmonger.x86_64 0.79.14-5.el9
< double-conversion.x86_64 3.1.5-6.el9
> dracut-config-generic.x86_64 057-21.git20230214.el9
< driverctl.noarch 0.111-2.el9
< hdparm.x86_64 9.62-2.el9
< httpd.x86_64 2.4.53-11.el9
< httpd-core.x86_64 2.4.53-11.el9
< httpd-filesystem.noarch 2.4.53-11.el9
< httpd-tools.x86_64 2.4.53-11.el9
< libuv.x86_64 1:1.42.0-1.el9
< lm_sensors-libs.x86_64 3.6.0-10.el9
< mailcap.noarch 2.1.49-5.el9
< mariadb-connector-c.x86_64 3.2.6-1.el9
< mariadb-connector-c-config.noarch 3.2.6-1.el9
< mod_http2.x86_64 1.15.19-4.el9
< mod_lua.x86_64 2.4.53-11.el9
< net-snmp.x86_64 1:5.9.1-9.el9
< net-snmp-agent-libs.x86_64 1:5.9.1-9.el9
< net-snmp-libs.x86_64 1:5.9.1-9.el9
< nmap.x86_64 3:7.92-1.el9
< openldap-compat.x86_64 2.6.2-3.el9
---
< pcp.x86_64 6.0.1-4.el9
< pcp-conf.x86_64 6.0.1-4.el9
< pcp-libs.x86_64 6.0.1-4.el9
< pcp-selinux.x86_64 6.0.1-4.el9
< pcp-system-tools.x86_64 6.0.1-4.el9
< perl-Term-ReadLine.noarch 1.17-480.el9
---
< subunit-filters.noarch 1.4.0-6.el9s
< sysstat.x86_64 12.5.4-5.el9
< tmpwatch.x86_64 2.11-20.el9
---
< tuned.noarch 2.20.0-1.el9
< tuned-profiles-cpu-partitioning.noarch 2.20.0-1.el9
< virt-what.x86_64 1.25-3.el9

The "success" content is from yesteday's testproject:
https://review.rdoproject.org/zuul/build/e25258f358fa4ca198d76837b1a50d6f

I don't really see anything that may be directly related to libvirt/guestfs though... Yes, we DO have many packages that aren't present on the failed job (that log comes from today)...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Loading subscribers...

Remote bug watches

Bug watches keep track of this bug in other bug trackers.