Comment 37 for bug 606491

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

It's June 2014 and this bug still hits the latest LTS release Trusty Thar.

# Why we are here

Some people say "fix the noise" and others say "it's not noise".
What's needed is to fix the noise and keep the signal.

Has anyone used Kevin's patches (see comments #34 and #35) ?

My guess is: many people have made local changes to their systems and not bother afterwards, making the bug still occur in latest LTS release.

I'm patching my system now but it's important to fix this.

# Why this is important

Reason 1: Noise in mailboxes is a more serious issue than it appears. It's a fact that each uninformative message received from a system makes more probable that important messages will get missed.

Reason 2: The current situation makes the same message in normal operation (anacron instances overlap) and in dangerous situation (security updates not applied for extended periods of time, see comment #31).

Please raise this bug's "importance".

To have the bug fixed once and for all, let's clear up the situation.

# Why this is a bug in anacron and not only apt or whatever.

Normal operations:

* (1) Long job duration in normal operation.
* (2) Asynchronous uncontrollable job start times making overlaps happen in normal operation (e.g. daily).
* (3) Overlaps are reported by e-mail.

Any issue arising from (1), (2) or (3) only is a bug in anacron.

Abnormal operations:

* (4) Some jobs get stuck forever (abnormal operation).
* (5) Stuck jobs prevent other jobs for a possibly unlimited time.

Any issue arising from (4) or (5) may be anacron bugs or wishes to make anacron more robust, just like we generally expect our system to robustly stop buggy programs without crashing the computer.

## Normal operations

(1) some jobs are designed to wait for a long time (up to half an hour), even when everything is fine, form example /etc/cron.daily/apt . But some people don't see it because their config. My fresh 14.04 Trusty has package update-notifier-common installed which seems sufficient to trigger a sleep up to half an hour on that job invocation every time. This is normal operation.
(2) anacron is setup to be run on several occasions to minimize delay. For example, on top of running it daily at 07:30, it also runs at boot and on resume from suspend. Nothing prevents booting/resume minutes before 07:30, a delay shorter than normal jobs.
(3) (nothing more to say)

(1)+(2) makes overlaps part of normal operation.
(1)+(2)+(3) makes *noise* in mailbox about overlaps.

(1)+(2)+(3) makes this bug an *actual anacron bug*.

## Abnormal operations

(4) some jobs get broken and get stuck forever. This should not happen on plain human beings' machines. Sysadmin caring a little wish to be notified. Heavyweight sysadmins already have other ways to get notified and/or get jobs killed automatically.
(5) Prevented jobs may include security updates, which make it a serious issue (see comment #31).

(1)+(3)+(4) makes *signal* in mailbox about stuck jobs, but which looks like noise
(5) prevented jobs make the whole issue serious.

# What to do ?

Now we know where's the anacron bug and where's the feature wish.

## Raise bug importance

Noise in mailboxes is a more serious issue than it appears. It's a fact that each uninformative message received from a system makes more probable that important messages will get missed. Plus the current situation makes the same message in harmless and dangerous situation.

For these reasons, I request to raise the importance of this bug.

## Fix anacron bug: disable noise mail that report overlap because it's really noise.

Actually (3) reports overlaps, not stuck jobs. On a server running all the time and never rebooting, it somehow only reports stuck jobs, but Ubuntu is not only for servers never rebooting. Turning on or resuming your system minutes before a scheduled run are normal operation for many computers. On such machines there's no way (3) can be reliably used to detect stuck jobs yet it makes noise.

Kevin's patch (comment #34) fixes (3). No more noise in mailbox, but no more signal.

## Grant anacron's wish: get mail about *stuck jobs* (not overlap) because that's what sysadmin really need.

Kevin's patch (comment #35) actually reports stuck jobs because it can measure the duration and react to it.

It is important because it makes an explicit signal saying that there's a stuck job, not some dull "already started" noise.

## Grant another anacron's wish: be more resilient to stuck jobs

Find a clean solution to ensure that other jobs are not just ignored when one jobs gets stuck.

Thank you for your attention and for any comment.