riscv64 build - hang at the end of the build with hanging "dbus-test-runner"

Bug #1891158 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dbus-test-runner (Ubuntu)
New
Undecided
Unassigned
dee (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

The builds for riscv64 on e.g. https://launchpad.net/ubuntu/+source/dee/1.2.7+17.10.20170616-6build1 hang eventually.

All other architectures work fine.

Build log tail will be like:
...
  symlinking changelog.Debian.gz in dee-tools to file in libdee-1.0-4
pkgstripfiles: Running PNG optimization (using 8 cpus) for package dee-tools ...
pkgstripfiles: No PNG files.
dpkg-deb: building package 'dee-tools' in '../dee-tools_1.2.7+17.10.20170616-6build1_riscv64.deb'.
pkgstriptranslations: libdee-1.0-4-dbgsym does not contain translations, skipping
pkgstriptranslations: no translation files, not creating tarball
pkgmaintainermangler: Not overriding Maintainer for domain lists.ubuntu.com
pkgstripfiles: processing control file: debian/.debhelper/libdee-1.0-4/dbgsym-root/DEBIAN/control, package libdee-1.0-4-dbgsym, directory debian/.debhelper/libdee-1.0-4/dbgsym-root
dpkg-deb: building package 'libdee-1.0-4-dbgsym' in 'debian/.debhelper/scratch-space/build-libdee-1.0-4/libdee-1.0-4-dbgsym_1.2.7+17.10.20170616-6build1_riscv64.deb'.
 Renaming libdee-1.0-4-dbgsym_1.2.7+17.10.20170616-6build1_riscv64.deb to libdee-1.0-4-dbgsym_1.2.7+17.10.20170616-6build1_riscv64.ddeb
pkgstriptranslations: dee-tools-dbgsym does not contain translations, skipping
pkgstriptranslations: no translation files, not creating tarball
pkgmaintainermangler: Not overriding Maintainer for domain lists.ubuntu.com
pkgstripfiles: processing control file: debian/.debhelper/dee-tools/dbgsym-root/DEBIAN/control, package dee-tools-dbgsym, directory debian/.debhelper/dee-tools/dbgsym-root
dpkg-deb: building package 'dee-tools-dbgsym' in 'debian/.debhelper/scratch-space/build-dee-tools/dee-tools-dbgsym_1.2.7+17.10.20170616-6build1_riscv64.deb'.
 Renaming dee-tools-dbgsym_1.2.7+17.10.20170616-6build1_riscv64.deb to dee-tools-dbgsym_1.2.7+17.10.20170616-6build1_riscv64.ddeb
 dpkg-genbuildinfo --build=any
 dpkg-genchanges --build=any -mLaunchpad Build Daemon <email address hidden> >../dee_1.2.7+17.10.20170616-6build1_riscv64.changes
dpkg-genchanges: info: binary-only arch-specific upload (source code and arch-indep packages not included)
 dpkg-source --after-build .
dpkg-buildpackage: info: binary-only upload (no source included)

Then it will be stuck until at 24h a timeout will reap the builder.

It turned out (thanks cjwatson) that on these systems there is a leftover process

Zombie schroot process
Looks like there's a dbus-daemon process still running in the background of the build that the build didn't clean up properly
buildd 481987 0.0 0.0 5244 2780 ? S Aug10 0:00 dbus-daemon --config-file /usr/share/dbus-test-runner/session.conf --print-address
# ls -l /proc/481987/cwd
lrwxrwxrwx 1 buildd buildd 0 Aug 11 10:08 /proc/481987/cwd -> /home/buildd/build-PACKAGEBUILD-19646792/chroot-autobuild/build/dee-1HAMJl/dee-1.2.7+17.10.20170616/tests

On #ubuntu-devel we discussed this and agreed that we should not just retry until it slips through by accident, we might never be able to build it again.

Instead we should try looking for what actually hangs on riscv64 and why.

tags: added: update-excuse
Changed in dee (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
summary: - riscv64 build issue hang at the end of the build (groovy)
+ riscv64 build - hang at the end of the build (groovy)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: riscv64 build - hang at the end of the build (groovy)

If you catch the log at the right time there is a related test failure:

** (dbus-test-runner:1577514): WARNING **: 12:15:11.723: Timing out at maximum wait of 60 seconds.
(dbus-test-runner:1577514): libdbustest-CRITICAL **: 12:15:11.736: dbus_test_service_run: assertion 'all_tasks(service, all_tasks_finished_helper, NULL)' failed

This most likely is what leaves the processes around.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Good case:

task-0: Shutting down
DBus daemon: Shutdown
make[4]: Leaving directory '/<<BUILDDIR>>/dee-1.2.7+17.10.20170616/tests'

Bad case:

task-0: /ModelTags/SharedModel/TwoTags: OK

** (dbus-test-runner:2028709): WARNING **: 12:41:16.457: Timing out at maximum wait of 60 seconds.

(dbus-test-runner:2028709): libdbustest-CRITICAL **: 12:41:16.464: dbus_test_service_run: assertion 'all_tasks(service, all_tasks_finished_helper, NULL)' failed
task-0: Shutting down
DBus daemon: Shutdown
make[4]: *** [Makefile:1151: test] Error 255
make[4]: Leaving directory '/<<PKGBUILDDIR>>/tests'

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

When lowering the timeout to trigger the same on x86 (to see if the hang occurs there as well) I can see that the test breaks as it does on riscv64, but the hang does not happen.

task-0: /Model/Transaction/Target2ChangeRemoveAppend: OK

** (dbus-test-runner:15767): WARNING **: 11:26:03.671: Timing out at maximum wait of 1 seconds.
task-0: /Model/Transaction/SignalOrder: OK
task-0: /Model/Transaction/ConcurrentModification: OK
task-0: /Model/Transaction/DoubleCommit: OK
task-0: /Model/Transaction/BasicTypes: OK
task-0: /Model/Transaction/Rows/Allocation: OK

(dbus-test-runner:15767): libdbustest-CRITICAL **: 11:26:03.671: dbus_test_service_run: assertion 'all_tasks(service, all_tasks_finished_helper, NULL)' failed
task-0: Shutting down
DBus daemon: Shutdown
make[4]: *** [Makefile:1151: test] Error 255
make[4]: Leaving directory '/<<BUILDDIR>>/dee-1.2.7+17.10.20170616/tests'
make[3]: *** [Makefile:939: check-am] Error 2
make[3]: Leaving directory '/<<BUILDDIR>>/dee-1.2.7+17.10.20170616/tests'
kill: (15773): No such process
make[2]: *** [Makefile:525: check-recursive] Error 1
make[2]: Leaving directory '/<<BUILDDIR>>/dee-1.2.7+17.10.20170616'
dh_auto_test: error: make -j4 check VERBOSE=1 returned exit code 2
make[1]: *** [debian/rules:26: override_dh_auto_test] Error 25
make[1]: Leaving directory '/<<BUILDDIR>>/dee-1.2.7+17.10.20170616'
make: *** [debian/rules:14: binary] Error 2
dpkg-buildpackage: error: fakeroot debian/rules binary subprocess returned exit status 2
semop(1): encountered an error: Invalid argument
--------------------------------------------------------------------------------
Build finished at 2020-08-12T11:26:03Z

From https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4200/+build/19818311

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It turned out that this has happened in the past and then resolved, so this is a comeback.
For "dee" we can bump the timeout or skip the test on riscv64 for the immediate problem

But the "dbus-test-runner timeout makes riscv64 builds" hang is a problem on its own.
Therefore I added a bug task to dbus-test-runner

summary: - riscv64 build - hang at the end of the build (groovy)
+ riscv64 build - hang at the end of the build with hanging "dbus-test-
+ runner"
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

From IRC:
[13:34] <cpaelzer> Laney: ha - timeout = 1 on x86 triggers the same warnings/erorrs, but NOT the hang of the builder
[13:34] <cpaelzer> https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4200/+build/19818311 is an example
[13:34] <Laney> interesting
[13:35] <Laney> perhaps riscv64 does process reaping differently
[13:37] <wgrant> Laney, cpaelzer: dbus-test-runner is what generally gets stuck, I haven't worked out why.
[13:37] <wgrant> But there are a couple of packages that were consistently hanging because it didn't die for a while. But they seemed to eventually fix themselves by around maybe September.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Bumping the timeout isn't enough :-/
It still gets into the hang and breaks build.

Chances are we are not actually facing a timeout issue, but something around
  /var/log/launchpad-buildd/default.log-20200812:2020-08-11 12:45:42+0000 [-] Build log: (dbus-test-runner:1807632): libdbustest-CRITICAL **: 12:45:42.692: dbus_test_service_run: assertion 'all_tasks(service, all_tasks_finished_helper, NULL)' failed

leaving behind the process.

Therefore - for the time being to unblock things - skip the test on riscv64.
Since that is avoidance and not a fix I'll leave this bug up.

Maybe a debug&fix in dbus-test-runner will eventually fix things for real.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

When running the build in a local groovy qemu riscv64 guest the test left no dbus-test-runner processes for debugging around :-/

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

https://launchpad.net/ubuntu/+source/dee/1.2.7+17.10.20170616-6ubuntu1 build fine now.
It depends on ICU, but otherwise is ready to go now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The problem is Ubuntu only, as [1] is good.
And the real fix should be in dbus-test-runner, but without repro it will be unclear if anyone gets to that or if we just hope it resolves itslef (again).

But due to the above it makes no sense to submit the fix to Debian - as for them it would reduce test coverage.

[1]: https://buildd.debian.org/status/fetch.php?pkg=dee&arch=riscv64&ver=1.2.7%2B17.10.20170616-6%2Bb1&stamp=1591129930&raw=0

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.