snapd

Bug #1867616
Comment #4

Comment 4 for bug 1867616

Revision history for this message

Maciej Borzecki (maciek-borzecki) wrote on 2020-03-16: Re: refresh of snapd failed, maybe because of concurrent refresh of core

From looking at the source code and the logs, I believe this is what happens:
- snapd snap is updated
- snapd requests a restart
- we reach the code in daemon.Stop() where we attempt a graceful shutdown of http connections
- the shutdown code reaches a default timeout (25s, matches with the log), while there's still active connections
- the listen socket is closed as the first step, so no new connections are possible
- the error of the tomb is set and propagated up the call stack
- snapd exits with an error triggering a failover handling which reverts the new revision

I suspect there may be a slow client hanging on the snapd API socket that keeps the connection alive. We should consider swallowing up the error like we do for system restarts, or using a forceful http.Server.Close() when the timeout is hit.