landscape-common.postinst stuck with defunct who

Bug #277038 reported by Martin von Gagern
50
This bug affects 1 person
Affects Status Importance Assigned to Milestone
landscape-client (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: landscape-common

During an update of Intrepid (I think from Alpha 6 to Beta), Adept was updating landscape-common for several minutes without any progress. Investigating the issue with ps got me this process call stack:

/usr/bin/dpkg --status-fd 3 --configure <lots of packages...>
/usr/bin/perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/landscape-common.postinst configure
/bin/sh /var/lib/dpkg/info/landscape-common.postinst configure
/bin/sh -e /usr/sbin/update-motd
run-parts --lsbsysinit /etc/update-motd.d
/bin/sh /etc/update-motd.d/50-landscape-sysinfo
/usr/bin/python /usr/bin/landscape-sysinfo
[who] <defunct>

Looks like who terminated, but landscape-sysinfo wasn't ready for dead children. Killing who gave no result, neither with SIGTERM nor with SIGKILL. SIGKILL to its parent, landscape-sysinfo, however resulted in Adept resuming its operations. landscape-common wasn't listed in the output of "dpkg --audit" after all of this, but I don't know whether the update actually worked as expected despite this problem.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I'm not sure I understand the lines you posted. Was [who] a leaf of the process tree starting at dpkg --configure?

Revision history for this message
Martin von Gagern (gagern) wrote :

Yes. The "tree" starting at dpkg was linear, with one child for each process, except for who which was a leaf.

I just had a look at the code. /usr/lib/python2.5/site-packages/landscape/lib/sysstats.py uses twisted.internet.utils.getProcessOutputAndValue to call "who -q". Therefore the actual cause of this bug might also lie in python-twisted-core, which might for some reason have failed to reap this dead child. Should this bug here therefore be marked as affecting twisted as well as landscape-client?

I cannot reproduce the issue by calling landscape-sysinfo manually. If I hadn't been in such a hurry, I might have thought of stracing landscape-sysinfo, but it's too late for that now.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

It could also be stuck somewhere else. We had a problem before with the whole process being stopped because dpkg was asking a configuration question, and this was unexpected. I don't know Adept, so I don't know how it would handle that situation.

Revision history for this message
Martin von Gagern (gagern) wrote :

I'm not sure enough to completely rule that out, but it sounds unlikely to me, for the following reasons. Either lanscape-sysinfo itself would be asking a question. That doesn't seem to be in its job description, especially when called without parameters, so I think this unlikely. Or some other process would be asking a question. Then there would be nothing to prevent lanscape-sysinfo from reaping its defunct child. Ergo, no questions involved.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Marking it as confirmed as we have a few duplicates already, albeit not all dupes have enough details to tell if it's "who" that is stuck.

Changed in landscape-client:
status: New → Confirmed
Revision history for this message
Martin von Gagern (gagern) wrote : Re: [Bug 277038] Re: landscape-common.postinst stuck with defunct who

Andreas Hasenack wrote:
> not all dupes have enough details to tell if it's "who" that is stuck.

If the error lies in the way how landscape-sysinfo executes other
processes via python-twisted-core, then the invocation of other commands
invoked using this mechanism might turn zombie as well.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

For a moment I thought this was fixed in bug #257346, but the two other bug reports we got (#291282 and #293598) seem to have versions with the fix.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Another data point: this is happening in the service monitor in MAAS when using `getProcessOutputAndValue`. (See also bug #1793448)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.