[SRU] segfault in log.c code causes phone reboot loops
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical System Image |
Fix Released
|
Critical
|
Ondrej Kubik | ||
upstart |
Fix Committed
|
Undecided
|
James Hunt | ||
upstart (Ubuntu) |
Fix Released
|
Critical
|
James Hunt | ||
Utopic |
Won't Fix
|
Critical
|
Unassigned | ||
Vivid |
Won't Fix
|
Critical
|
James Hunt | ||
Wily |
Fix Released
|
Critical
|
James Hunt | ||
upstart (Ubuntu RTM) |
Fix Released
|
Critical
|
Unassigned |
Bug Description
= Summary =
The version of Upstart in vivid is affected by a coule of bugs relating
to the flushing data from early-boot jobs to disk which can both result
in a crash:
== Problem 1 ==
An internal list is mishandled meaning a crash could occur randomly.
== Problem 2 ==
Jobs which spawn processes in the background then themselves exit can
cause a crash due.
= Explanation of how Upstart flushes early job output =
If an Upstart job starts *and ends* early in the boot sequence (before
the log partition is mounted and writable) and produces output to its
stdout/stderr, Upstart will cache the output for later flushing by
adding the 'Log' object associated with the 'Job' to a list.
When the log partition is mounted writable, the
/etc/init/
notify-
of early-boot job output which takes the form of iterating the
'log_unflushed_
= Code Specifics =
There are 2 issues (note that the numbers used below match those used in
the Summary).
== Problem 1 detail ==
Due to a bug in the way the 'log_unflushed_
'Log' cannot be added to the list directly, so is added via an
intermediary ('NihListElem') node), a crash can result when iterating
the list since the 'Log' is freed, but NOT the intermediary node. The
implication is that it is possible for the intermediary node to be
attempt to dereference already-freed data, resulting in a crash.
== Problem 2 detail ==
If a job spawns a process in the background, then itself exits, that
jobs 'Log' entry will be added to the 'log_unflushed_
if the background process produces output and then exits before Upstart
attempts to flush the original jobs data to disk, the 'NihIo'
corresponding to the log will be serviced automatically and the data
flushed to disk. The problem comes when Upstart receives the
notification to flush the 'log_unflushed_
now contains an entry which has already been freed (since all its data
has already been flushed). The result is an assertion failure.
= Fix =
== Problem 1 fix ==
Correct the 'log_unflushed_
'NihListElem' (which will automatically free the 'Log' object), not by
simply freeing the 'Log' object itself.
* Branch: lp:~jamesodhunt/ubuntu/vivid/upstart/bug-1447756/
* New Upstart test added to avoid regression?: Yes.
== Problem 2 fix ==
Correct the assumption that the only entries in the
'log_unflushed_
there is in fact any data to flush; if not, remove the entry from the
'log_unflushed_
automatically by the 'NihIo'.
* Branch: lp:~jamesodhunt/upstart/bug-1447756-the-actual-fix
* New Upstart test added to avoid regression?: Yes.
= Workarounds =
If a system is affected by this bug, it will be manifested by a crash early in the boot sequence.
To overcome the issue, either:
a) Boot by adding "--no-log" to the kernel command-line.
b) Disable the flush-early-job-log job (assuming the machine is bootable) by running the following:
$ echo manual | sudo tee -a /etc/init/
= Impact =
The issue has been present in Upstart since logging was introduced but
no known instances of crashes relating to these problems have been
reported prior to this bug being reported (which relates the the issue
being seen on a very small subset of specific Ubuntu Touch phone
hardware where Upstart is used as the system init daemon).
Note that vivid still uses Upstart for managing the graphical session,
but now uses systemd by default for the system init daemon. Since the session (Upstart) init does not even require
a flush-early-
= Test Case =
This bug is extremely hard to surface so the approach is simply to check that the internal list can be iterated correctly by:
1) Booting the system with upstart
(select the Upstart option from the grub menu or add "init=/
2) Running the following on a system booted with Upstart:
$ for i in $(seq 17); do sudo start flush-early-
= Regression Potential =
None expected:
- As noted in Impact, the problems fixed by this version of Upstart have not been observed on server/desktop systems before.
- The fix is already in wily and no problems have been reported.
- See Impact.
= Original Description =
We recently started getting reprots from phone users that their devices go into a reboot loop after changing the language or getting an OTA upgrade (either of both end with a reboot of the phone)
after a bit of research we collected the log at http://
this shows a segfault of upstarts init binary in the log.c code:
[ 6.999083]init: log.c:819: Assertion failed in log_clear_
[ 7.000279]init: Caught abort, core dumped
[ 7.467176]Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600
Related branches
- Upstart Reviewers: Pending requested
-
Diff: 230 lines (+167/-3)3 files modifiedChangeLog (+18/-0)
init/log.c (+42/-0)
init/tests/test_log.c (+107/-3)
Changed in upstart (Ubuntu): | |
importance: | Undecided → Critical |
status: | New → Confirmed |
Changed in canonical-devices-system-image: | |
importance: | Undecided → Critical |
milestone: | none → ww17-2015 |
status: | New → Confirmed |
tags: | added: hotfix |
Changed in upstart (Ubuntu): | |
assignee: | nobody → James Hunt (jamesodhunt) |
Changed in upstart: | |
assignee: | nobody → James Hunt (jamesodhunt) |
Changed in canonical-devices-system-image: | |
milestone: | ww17-2015 → ww19-ota |
Changed in canonical-devices-system-image: | |
assignee: | nobody → Ondrej Kubik (w-ondra) |
Changed in canonical-devices-system-image: | |
status: | Confirmed → Fix Committed |
Changed in upstart (Ubuntu): | |
status: | Confirmed → In Progress |
Changed in upstart: | |
status: | New → In Progress |
Changed in canonical-devices-system-image: | |
milestone: | ww19-ota → ww22-2015 |
status: | Fix Committed → In Progress |
tags: | added: patch |
Changed in upstart: | |
status: | In Progress → Fix Committed |
Changed in upstart (Ubuntu Utopic): | |
status: | New → Won't Fix |
Changed in canonical-devices-system-image: | |
status: | In Progress → Fix Released |
Changed in upstart (Ubuntu Vivid): | |
status: | New → In Progress |
assignee: | nobody → James Hunt (jamesodhunt) |
description: | updated |
summary: |
- segfault in log.c code causes phone reboot loops + [SRU] segfault in log.c code causes phone reboot loops |
tags: | removed: verification-failed |
Changed in upstart (Ubuntu Vivid): | |
importance: | Undecided → Critical |
Changed in upstart (Ubuntu RTM): | |
importance: | Undecided → Critical |
Changed in upstart (Ubuntu Utopic): | |
importance: | Undecided → Critical |
Changed in upstart (Ubuntu Vivid): | |
status: | Fix Committed → Won't Fix |
Note for the record that this bug has so far only been reported on the ubuntu-rtm branch, not the ubuntu branch, of the upstart package. However, the differences between these branches are negligible and include no changes to the upstream code.