On Wed, Jan 22, 2014 at 05:11:03PM -0000, Clint Byrum wrote:
> The other one is the one that would sweep up the mess we occasionally
> see when something misbehaves.
> I'd like to see Ubuntu's shutdown do more to protect against that
> failure mode.
I would, too, but I don't agree that the method he proposes actually does
this. Killing processes and unmounting devices in a loop is basically what
we do already; the key difference is that some filesystems - potentially
even including the root filesystem - may require additional daemon processes
for their operation. This is the case for example if you have network
filesystems mounted and are using NetworkManager, or if you use
gss-encrypted NFS, or iscsi. So "kill all processes and unmount all
filesystems in a loop" is not a reliable shutdown mechanism, it just moves
the problem cases somewhere that Lennart apparently isn't seeing them.
One of the problems we've seen repeatedly with trying to get clean shutdown
involves NetworkManager's child processes *being* killed while they're still
needed as part of managing the network. This is not a bug that's fixed by
killing more processes.
There may be other failure scenarios that need to be addressed. Part of the
problem has been a lack of information about what's actually holding the
root filesystem open in these cases. There's a pending merge proposal on
sysvinit that should help us gather this information.
On Wed, Jan 22, 2014 at 05:11:03PM -0000, Clint Byrum wrote:
> The other one is the one that would sweep up the mess we occasionally
> see when something misbehaves.
> I'd like to see Ubuntu's shutdown do more to protect against that
> failure mode.
I would, too, but I don't agree that the method he proposes actually does
this. Killing processes and unmounting devices in a loop is basically what
we do already; the key difference is that some filesystems - potentially
even including the root filesystem - may require additional daemon processes
for their operation. This is the case for example if you have network
filesystems mounted and are using NetworkManager, or if you use
gss-encrypted NFS, or iscsi. So "kill all processes and unmount all
filesystems in a loop" is not a reliable shutdown mechanism, it just moves
the problem cases somewhere that Lennart apparently isn't seeing them.
One of the problems we've seen repeatedly with trying to get clean shutdown
involves NetworkManager's child processes *being* killed while they're still
needed as part of managing the network. This is not a bug that's fixed by
killing more processes.
There may be other failure scenarios that need to be addressed. Part of the
problem has been a lack of information about what's actually holding the
root filesystem open in these cases. There's a pending merge proposal on
sysvinit that should help us gather this information.