Comment 0 for bug 1699850

Revision history for this message
Julian Andres Klode (juliank) wrote :

[Impact]

apt-daily.service is launched by a timer that depends on network-online.target (after the fixes for 1686470 are in everywhere)

At boot that is mostly sufficient for it to have network online, but it does not seem to work all the time, and we might be disagreeing with network-manager and friends what online state means.

At resume time, network-online.target is still active, so the service is started as soon as possible when it tries to catch up. Depending on the timing, the network connectivity might not be there yet, and it will fail and only retry 12 hours later.

[Proposed solution]
Introduce a new apt-helper wait-online that tries to connect() to remote hosts specified in sources.list until one connection works or a TIMEOUT is reached. The proposed algorithm looks something like this:

while (time elapsed < TIMEOUT):
  for each entry:
    host = gethostbyname()
    if host failed:
      continue
    fd = connect to it
    if fd is invalid:
      continue

    all fds += fd

    if poll(all fds, 100 ms timeout) finds a connected one:
      exit(0)

exit(42) # timeout

There are two things to consider:
* gethostbyname() and connect() may fail if network is not up yet, so we need to retry (we might need to sleep somewhere)
* If poll() fails, we likely sleep enough, so no extra sleep needed.

I believe the time out should be something like 30s.

On the systemd service side, we add:
  RestartForceExitStatus=42
  RestartSec=15m

To retry the service after 15 minutes.