rootwrap filter fail when killing a pid that doesn't exist
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Thierry Carrez |
Bug Description
I've seen this several times, but only now tracked it down.
I get the following error when running nova-network:
2012-06-07 14:26:58 CRITICAL nova [-] Unexpected error while running command.
Command: sudo nova-rootwrap kill -9 4514
Exit code: 99
Stdout: 'Unauthorized command: kill -9 4514\n'
Stderr: ''
The issue is that we're running kill_dhcp in linux net. Kill DHCP gets its pid not from a running process, but from a pid file in the network directory (e.g., /var/lib/
Thus, if the pid file still exists, but dnsmasq is not running, the above kill command is trying to kill a pid that doesn't exit.
However, the process of applying the filter is the following:
try:
command = os.readlink(
# NOTE(dprince): /proc/PID/exe may have ' (deleted)' on
# the end if an executable is updated or deleted
if command.endswith(" (deleted)"):
if command not in self.args[1]:
# Affected executable not in accepted list
except (ValueError, OSError):
# Incorrect PID
return False
Importantly, if the file in proc does not exist, the filter fails. In this case, because the process is no longer running the read fails and you get a filter failure.
Perhaps for the kill filter it should be able to run kill commands for pids that do not exist? Either that, or we should raise some kind of specific exception that could be caught and ignored by higher-level code like kill_dhcp that is perfectly happy if the PID no longer exists. Right now the rootwrap failure will prevent nova-network from booting until you figure out you need to clear out the old dnsmasq pid.
tags: | added: rootwarp |
tags: |
added: rootwrap removed: rootwarp |
Changed in nova: | |
milestone: | none → folsom-2 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | folsom-2 → 2012.2 |
Nice find.
The trick is that allowing to kill PIDs that do no longer (or not yet) exist creates a (admitted minimal) flaw in the filter. And you can't really raise a higher level error since those are running in two different processes.
The solution, I think, is to accept that nova-rootwrap returns "Unauthorized command" in that specific case. nova-rootwrap returns 99 in that case (it otherwise returns the return code of the shell command it ran). This can be achieved by calling utils.execute with check_exit_ code=[0, 99].
Would that work for you ?