[kernel panic] init: log.c:786: Assertion failed in log_clear_unflushed: log->remote_closed

Bug #935585 reported by Raphael Gradenwitz
74
This bug affects 14 people
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
Fix Released
High
James Hunt
Precise
Fix Released
High
James Hunt

Bug Description

Since update upstart from 1.4-0ubuntu6 to 1.4-0ubuntu8
-> Kernel Panic, see picture.

Revision history for this message
Raphael Gradenwitz (raphael-gradenwitz) wrote :
summary: - init: log.c:786: Assertion failes in log_cleasr_unflushed:
+ init: log.c:786: Assertion failes in log_clear_unflushed:
log->remote_closed
Revision history for this message
Raphael Gradenwitz (raphael-gradenwitz) wrote : Re: init: log.c:786: Assertion failes in log_clear_unflushed: log->remote_closed

Downgrading to 1.4-0ubuntu7 (in recovery console) did _not_ fix it.
But:
Downgrading to 1.3-0ubuntu11 (from here https://launchpad.net/ubuntu/precise/+package/upstart) fixed it and i could boot again without K-panic.

btw: Ubuntu 12.04, amd64

Revision history for this message
Fabien Tassin (fta) wrote :

Same here, precise 32bit.

I have a slightly different stack (but the same assert()). Booting on older kernels doesn't help. Rescue kernels are ok.

I also found independently that upstart 1.4-0ubuntu8 and u7 are NOK while u6 is OK so the culprit is u7.

Changed in upstart:
status: New → Confirmed
Fabien Tassin (fta)
summary: - init: log.c:786: Assertion failes in log_clear_unflushed:
+ [kernel panic] init: log.c:786: Assertion failed in log_clear_unflushed:
log->remote_closed
James Hunt (jamesodhunt)
affects: upstart → ubuntu
affects: ubuntu → upstart (Ubuntu)
Changed in upstart (Ubuntu):
assignee: nobody → James Hunt (jamesodhunt)
Revision history for this message
James Hunt (jamesodhunt) wrote :

Whilst we investigate this issue, please disable logging by adding "--no-log" to the kernel command-line. It is also possible to stop the log from being flushed by disabling the "flush-early-job-log" Upstart job (/etc/init/flush-early-job-log.conf), for example like this:

$ echo manual | sudo tee /etc/init/flush-early-job-log.override
$ sudo reboot

Revision history for this message
Fabien Tassin (fta) wrote :

alternatively, and since I've already downgraded to -ubuntu6, I simply put upstart on hold until a fix is available.

$ echo upstart hold | sudo dpkg --set-selections

Revision history for this message
James Hunt (jamesodhunt) wrote :

Can those affected by this bug confirm they have a /var/log/upstart/ directory? Also, attaching the output of the following command would be useful:

    /sbin/initctl show-config

Revision history for this message
Fabien Tassin (fta) wrote :

I do have that directory, containing a bunch of files.
Here is my initctl config.

Revision history for this message
Fabien Tassin (fta) wrote :

i compared my initctl config between 1 box impacted and another (which is quite similar), the main difference is that the impacted box has xinetd installed. Could that be it?

Revision history for this message
Raphael Gradenwitz (raphael-gradenwitz) wrote :

I did the same workaround as in #5

/var/log/upstart/ directory exists.

Revision history for this message
Fabien Tassin (fta) wrote :

neither #7 nor #9 has 'cgroup-lite', looks like a good candidate.

James Hunt (jamesodhunt)
Changed in upstart (Ubuntu):
importance: Undecided → High
Revision history for this message
James Hunt (jamesodhunt) wrote :

Please would those affected by this bug test the new upstart build version '1.4-0ubuntu9~bug935585' in the PPA below and provide feedback:

https://launchpad.net/~jamesodhunt/+archive/bug-935585/

Since we have still not managed to recreate this problem directly, it would also be helpful to know whether the assertion fails on every boot or whether the problem is intermittent.

Revision history for this message
Peter Silva (peter-bsqt) wrote :

For me it is every boot, and /var/log/upstart has 52 files in it.

Revision history for this message
James Hunt (jamesodhunt) wrote :

@Peter: Thank you for the information.

We are still unsure exactly what scenario is causing this assertion failure so would encourage all those experiencing this issue to try the Upstart version in comment #11. Feedback on this will provide us with valuable information which will help us to resolve this issue.

Revision history for this message
Raphael Gradenwitz (raphael-gradenwitz) wrote :

I could over and over recreate this kernel panic.
The workaround from #4 (--no-log) was each time necessary to bypass the bug and boot.

BUT...
vvvvvvvvvvvvvv
> #11 works! <
^^^^^^^^^^^^

Thanks!

Revision history for this message
Fabien Tassin (fta) wrote :

@jamesodhunt: works for me. no more panic. great, thanks.

Revision history for this message
Peter Silva (peter-bsqt) wrote :

@jamesodhunt: works for me also, no panic running 17 now.

Revision history for this message
James Hunt (jamesodhunt) wrote :

Out of interest, are affected users using SSD devices?

Revision history for this message
Fabien Tassin (fta) wrote :

@jamesodhunt: i'm not. 2 HD here.

Revision history for this message
Chris Peach (peachris+ubuntu) wrote :

I run two very similar virtual machines on a VMware ESX server, and only one of them is affected. The main difference is that the affected machine uses LVM.

I was able to make it boot using the --no-log kernel parameter. As a more permanent solution, I used the following line successfully:
$ echo manual | sudo tee /etc/init/flush-early-job-log.override

Please let more know if you would like to learn more about my LVM2 setup.

Revision history for this message
Peter Silva (peter-bsqt) wrote :

no SSD on the desktop system, just two HDD's.

Revision history for this message
Raphael Gradenwitz (raphael-gradenwitz) wrote :

No SSD, no LVM here but Multi-Boot (hth)

Revision history for this message
Jonathan Aquilina (eagles051387) wrote :

i can confirm this as well, but an interesting thing we tried was reverting to the inital 3.0.0 precise server kernel, and then booted back on the 3.2.0-17 kernel which worked for a while then the same issue surfaced.

running precise 64bit

our server is running lvm but not on an SSD drive

Revision history for this message
Jonathan Aquilina (eagles051387) wrote :

One thing i forgot to add is currently there is nothing virtualized on this machine, but we are in the process of setting up kvm and openstack.

Revision history for this message
Radek Zajic (radek-zajic) wrote :

I've upgraded to precise today (from oneiric) via do-release-upgrade -d. The bug also affected my system; the fix in #11 helped.

The hardware is Intel(R) Atom(TM) CPU 230 @ 1.60GHz, nVidia ION chipset (prestigio ION PC)

root@router-barrandov:~# lspci
00:00.0 Host bridge: NVIDIA Corporation MCP79 Host Bridge (rev b1)
00:00.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.0 ISA bridge: NVIDIA Corporation MCP79 LPC Bridge (rev b2)
00:03.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.2 SMBus: NVIDIA Corporation MCP79 SMBus (rev b1)
00:03.3 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1)
00:03.5 Co-processor: NVIDIA Corporation MCP79 Co-processor (rev b1)
00:04.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:04.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:06.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:06.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:08.0 Audio device: NVIDIA Corporation MCP79 High Definition Audio (rev b1)
00:09.0 PCI bridge: NVIDIA Corporation MCP79 PCI Bridge (rev b1)
00:0a.0 Ethernet controller: NVIDIA Corporation MCP79 Ethernet (rev b1)
00:0b.0 SATA controller: NVIDIA Corporation MCP79 AHCI Controller (rev b1)
00:10.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1)
02:00.0 VGA compatible controller: NVIDIA Corporation ION VGA (rev b1)

root@router-barrandov:~# lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 002: ID 152d:2338 JMicron Technology Corp. / JMicron USA Technology Corp. JM20337 Hi-Speed USB to SATA & PATA Combo Bridge
Bus 004 Device 002: ID 0458:0708 KYE Systems Corp. (Mouse Systems)
Bus 004 Device 003: ID 03f0:6004 Hewlett-Packard DeskJet 5550

2,5" SATA HDD, 3,5" HDD attached via USB (mass storage)

Revision history for this message
twelve17 (spam-twelve17) wrote :

Confirm that ppa from #11 worked for me as well. I am running Ubuntu 12 beta under Virtualbox. (For what it's worth, host is not running SSD, and guest box is pretending to have a regular SATA drive.

Revision history for this message
Chris Peach (peachris+ubuntu) wrote :

I successfully tested the patched upstart (1.4-0ubuntu9~bug935585) from the PPA mentioned in comment #11.

Of course, I had first disabled my workaround:
root@vmwareguest:~# rm /etc/init/flush-early-job-log.override

Then I installed the patched upstart and restarted the system half a dozen times without a hiccup. Good work!

Revision history for this message
t3rmin (matt-thetrents) wrote :

I experience this bug on kernel 3.2.0-17 and 3.2.0-18, but not when I select 3.0.0-16 (which I assume is a leftover from before upgrading to precise) at the boot menu. Installing the upstart PPA from #11 allowed me to boot into the 3.2 kernels.

Revision history for this message
Travis Rhoden (trhoden) wrote :

I also ran into this problem with both 3.2.0-17 and -18. I used the new Upstart from #11, and can now boot up successfully.

Revision history for this message
alessandro ciancaglini (alo) wrote :

We also ran into this problem with 3.2.0-18 on 2 servers. the strange thing is that on a third server everithing went ok.... (same model old Dell 860).

We used the new Upstart from #11, and can now boot up all servers successfully.

thanks!

Revision history for this message
Roman Yepishev (rye) wrote :

Right now I am able to reproduce this when I force the filesystem check (touch /forcefsck) on a real machine. I can't reproduce this with similar kvm setup. As Daviey pointed out, this may be a race condition.

Revision history for this message
Roman Yepishev (rye) wrote :

After enabling verbose mode I get the following:

[timestamp] init: log.c:786: Assertion failed in log_clear_unflushed: log->remote_closed
[timestamp] init: Caught abort, core dumped
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
[timestamp] Kernel panic - not syncing: Attempted to kill init!

Revision history for this message
Roman Yepishev (rye) wrote :

The PPA version works properly, I am unable to reproduce the issue any more.

Revision history for this message
David Kranz (david-kranz) wrote :

I had this problem when rebooting after installing openstack. This ppa fixed the problem.

Revision history for this message
Kevin Jackson (kevin-linuxservices) wrote :

Oh great - I get this too.
I worked around it by booting to recovery mode, doing an fsck and resuming boot.

Revision history for this message
Karl (kh2l) wrote :

This happens for me as well on a VMware VM, I thought that maybe LVM was the cause as my first install was using VMware Easy Install and that most likely didn't use LVM, but I tried a non LVM configuration and it also broke.

The fix available here does fix the issue though for me, once it's installed the issue goes away.

Revision history for this message
Chris Peach (peachris+ubuntu) wrote :

Now my other VMware guest is affected, namely when trying to boot the new kernel 3.2.0-18-29 (x86_64). On this machine, I had not installed the patch from the PPA mentioned above. This VM does not use LVM. The only unusual feature of this VM is that it uses ecryptfs to encrypt one home directory. To make this machine boot, I had to enter the “--no-log” kernel parameter.

Steve Langasek (vorlon)
Changed in upstart (Ubuntu Precise):
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 1.4-0ubuntu9

---------------
upstart (1.4-0ubuntu9) precise; urgency=low

  [ Steve Langasek ]
  * debian/conf/failsafe.conf: instead of waiting for the 'runlevel' event
    before considering failsafe done, stop this job as soon as we're
    starting rc-sysinit; that way, any delays in /etc/rcS.d will not cause
    confusing messages about networking delays when the network is not the
    problem. (LP: #950662)

  [ James Hunt ]
  * init/log.c:log_read_watch(): Set remote_closed for scenarios where error
     handler never called. (LP: #935585)

  [ Serge Hally ]
  * debian/conf/power-status-changed.conf: shut down on getting SIGPWR.
    Unprivileged tasks can't send this signal. In particular this will
    allow clean shutdown of containers from the host.
    (See http://www.makelinux.net/man/7/P/power-status-changed)

  [ Stéphane Graber ]
  * Rename Serge's job to shutdown.conf to avoid a name conflict with the
    event power-status-changed.
 -- Stephane Graber <email address hidden> Fri, 16 Mar 2012 13:48:04 -0400

Changed in upstart (Ubuntu Precise):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.