init: post-start can cause respawn to hang

Bug #711635 reported by B. J. Potter
58
This bug affects 9 people
Affects Status Importance Assigned to Milestone
upstart
Triaged
Medium
Unassigned
mysql-5.1 (Ubuntu)
Invalid
High
Unassigned
Lucid
Invalid
Undecided
Unassigned
Natty
Invalid
High
Unassigned
Oneiric
Invalid
High
Unassigned
Precise
Invalid
High
Unassigned
mysql-5.5 (Ubuntu)
Fix Released
High
Clint Byrum
Lucid
Invalid
Undecided
Unassigned
Natty
Invalid
Undecided
Unassigned
Oneiric
Invalid
Undecided
Unassigned
Precise
Fix Released
High
Clint Byrum
mysql-dfsg-5.1 (Ubuntu)
Invalid
High
Unassigned
Lucid
Won't Fix
High
Unassigned
Natty
Invalid
Undecided
Unassigned
Oneiric
Invalid
Undecided
Unassigned
Precise
Invalid
High
Unassigned
upstart (Ubuntu)
Triaged
Medium
James Hunt
Lucid
Won't Fix
Undecided
Unassigned
Natty
Won't Fix
Undecided
Unassigned
Oneiric
Won't Fix
Undecided
Unassigned
Precise
Won't Fix
Medium
Unassigned

Bug Description

== SRU Justification ==

= IMPACT =

Users who have inadvertently broken their mysql configuration will be met with a hung package upgrade process on any updates, and no easy way to see that fact.

= TEST CASE: =

1. sudo apt-get install mysql-server
2. edit /etc/mysql/my.cnf and add an invalid variable to the [mysqld] section
3. sudo apt-get --reinstall install mysql-server - affected system will hang forever on the 'start' of mysql
4. in another terminal, verify status of mysql job is 'start/respawn'
5. in that second terminal, run 'sudo stop mysql'
6. first terminal should now be "unlocked"
7. install updated package
8. install should error out or move on quietly after trying to start mysql, but not hang indefinitely.

= Dev Fix =

This was fixed in the 5.5 packages by making sure that in the post-start if the job is in 'respawn' that the post-start is aborted immediately.

= REGRESSION POTENTIAL =

Its possible that this code will cause some systems to error out where they would keep retrying forever. This might be particularly problematic in systems that use configuration management like puppet or chef, where they would boot the system, and then run the config management to apply the correct configs.

========

I believe that it is looking for some input when updating mysql. The updating just stops when mysql is being updated. There is no dialog box to be found. This has happened to me several times on different computers.

ProblemType: Package
DistroRelease: Ubuntu 10.04
Package: mysql-server-5.1 5.1.41-3ubuntu12.9
ProcVersionSignature: Ubuntu 2.6.32-27.49-generic 2.6.32.26+drm33.12
Uname: Linux 2.6.32-27-generic x86_64
NonfreeKernelModules: wl nvidia
Architecture: amd64
Date: Tue Feb 1 18:32:38 2011
ErrorMessage: subprocess installed post-installation script killed by signal (Terminated)
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
MySQLConf.etc.mysql.conf.d.mysqld.safe.syslog.cnf:
 [mysqld_safe]
 syslog
MySQLVarLibDirListing: ['mysql', 'debian-5.1.flag']
SourcePackage: mysql-dfsg-5.1
Title: package mysql-server-5.1 5.1.41-3ubuntu12.9 failed to install/upgrade: subprocess installed post-installation script killed by signal (Terminated)
mtime.conffile..etc.apparmor.d.usr.sbin.mysqld: 2010-03-06T11:30:30.842210

Revision history for this message
B. J. Potter (bjcubsfan) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

I can confirm this behavior. To reproduce, I did:
 * launch ec2 instance (ami-fa01f193 us-east-1 ubuntu-lucid-10.04-amd64-server-20110201.1)
 * disable lucid-updates
 * sudo apt-get update && sudo apt-get install mysql-server-5.1
 * modify /etc/apparmor.d/usr.sbin.mysqld, adding hashes to the two lines below:
  # /var/lib/mysql/ r,
  # /var/lib/mysql/** rwk,
 * enable -updates
 * sudo apt-get update && sudo apt-get dist-upgrade
 * system hangs on:
   Setting up mysql-server-core-5.1 (5.1.41-3ubuntu12.9) ...
   Setting up mysql-server-5.1 (5.1.41-3ubuntu12.9) ...
   Installing new version of config file /etc/init/mysql.conf ...

These 4 processes are running:
 7476 pts/2 Ss+ 0:00 /usr/bin/dpkg --status-fd 14 --configure libmysqlclient16 mysql-client-core-5.1 mysql-client-5.1 mysql-server-core-5.1 mysql-server-5.1
 7486 pts/2 S+ 0:00 /usr/bin/perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/mysql-server-5.1.postinst configure 5.1.41-3ubuntu12.7
 7496 pts/2 S+ 0:00 /bin/bash -e /var/lib/dpkg/info/mysql-server-5.1.postinst configure 5.1.41-3ubuntu12.7
 7703 pts/2 S+ 0:00 start mysql

because you commented out the lines in the config file, mysql cannot access /var/lib/mysql. I suspect you have app armour messages in your dmesg indicating this.

I'm not sure if I should call this bug invalid or confirmed.

Changed in mysql-dfsg-5.1 (Ubuntu):
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I think there is a bug in upstart here, as start shouldn't hang forever. It should error out pretty early actually. This seems to be a problem with post-start and respawn handling.

I created this job to simulate what mysqld does in this case, which is, start, and fail, while the post-start script loops for 30 seconds pinging mysql before exitting with a return code of 1.

# test-pstart

respawn
script
  sleep 2
  exit 1
end script

post-start script
  sleep 30
  exit 1
end script

# EOF

start test-pstart will hang forever.

Adding

respawn limit 3 120

saying only respawn it 3 times in 120 seconds before giving up, produces an immediate failure...

start: Job failed to start

Removing the 'exit 1' of the post-start doesn't change this.. just having the post-start wait 30 seconds seems to produce the error, but only when a respawn limit of any kind is specified. I've even tried respawn limit 10000 10000 .. this still produces the error where bare 'respawn' does not.

If nothing else, this behavior is *confusing*, and a warning in the man page for upstart that respawn may cause your start commands to hang indefinitely without a limit is in order, which is why I've added upstart to this bug report.

For the mysql-5.1 package, we should add a respawn limit to work around the problem. Since this is changing a default behavior, we should be very conservative. Three times in 120 seconds seems that.. as that means in 2 minutes mysql exited with a non-zero exit code. If mysql fails more than that.. it may be necessary for an administrator to step in and fix data/problems.

Marking Triaged in mysql. Agree with the importance of Low, as this only affects people whose mysqld's are severely broken already.

Changed in mysql-dfsg-5.1 (Ubuntu):
status: Confirmed → Triaged
summary: - mysql update fails waiting for input
+ mysql update fails waiting for upstart job to respawn
summary: - mysql update fails waiting for upstart job to respawn
+ init: post-start can cause respawn to hang
Changed in upstart:
status: New → Triaged
importance: Undecided → Medium
Changed in mysql-5.1 (Ubuntu):
status: New → Triaged
Changed in mysql-5.5 (Ubuntu):
status: New → Triaged
Changed in mysql-5.1 (Ubuntu):
importance: Undecided → Low
Changed in mysql-5.5 (Ubuntu):
importance: Undecided → Low
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

In reading the upstart code, specifically in init/job_process.c in the job_process_terminated() function, I believe the problem is the
assumption that there is nothing we need to do when the post-start exits with a non-zero status. In fact, I think this should mark the
job as failed, *even* if there is a respawn stanza, as a post-start delays the 'started' event, and so the job is not yet in the status where the job writer has verified that it has reached a state of working.

Perhaps when post-start exits with non-zero, the job goal should be changed to stop?

In the mean time, as a workaround, if a post-start detects problems with the main process, it should most likely *stop* the job itself. In the case of mysql, this would simply mean running 'stop' if mysql fails to start within 30 seconds.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I spent some time on this today, and I think that upstart's handling of respawn + post-start is probably making jobs harder than they need to be.

In the mysql case, we need to exit the mysqladmin ping loop if the process goes into 'respawn', so that the process can in fact be respawned... otherwise it waits. This also causes the usual respawn limit to not be hit because we're respawning 1 time every 30 seconds. In fact, just the lag of forking and running mysqladmin ping is enough to prevent us from restarting 10 times in 5 seconds, which is the default limit.

So IMO, upstart should have a way to respawn independent of post-start. But that is a much bigger change and needs more thought. For jobs with a post-start, they need to make sure the post-start is reactive to the respawn status by exitting when it is detected, and considering this in any respawn limits.

So, I'll be uploading a fix to precise's mysql-5.5 package which will fail with 2 respawns in 5 seconds, and exits the mysqladmin loop if status is respawn. This causes the job to report a failure to start if things are truly broken, which is what we want. It will also cause respawn to give up faster if mysqld exits, but I think that is fine given that it is a large database daemon and probably isn't going to respond well to the rapid respawning.

Changed in mysql-5.5 (Ubuntu):
status: Triaged → In Progress
assignee: nobody → Clint Byrum (clint-fewbar)
importance: Low → High
Changed in mysql-5.1 (Ubuntu):
importance: Low → High
Changed in mysql-dfsg-5.1 (Ubuntu):
importance: Low → High
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

One final thought, I'm raising the importance to High. Failing to detect and report a serious problem in this sort of daemon seems like an important problem!

Changed in mysql-5.1 (Ubuntu Lucid):
status: New → Invalid
Changed in mysql-5.1 (Ubuntu Precise):
status: Triaged → Invalid
Changed in mysql-5.5 (Ubuntu Precise):
status: In Progress → Fix Committed
Changed in mysql-dfsg-5.1 (Ubuntu Precise):
status: Triaged → Invalid
Changed in mysql-dfsg-5.1 (Ubuntu Lucid):
importance: Undecided → High
status: New → Triaged
Changed in mysql-5.1 (Ubuntu Oneiric):
importance: Undecided → High
status: New → Triaged
Changed in mysql-5.1 (Ubuntu Natty):
importance: Undecided → High
status: New → Triaged
Changed in mysql-dfsg-5.1 (Ubuntu Natty):
status: New → Invalid
Changed in mysql-dfsg-5.1 (Ubuntu Oneiric):
status: New → Invalid
tags: added: precise rls-mgr-p-tracking
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mysql-5.5 - 5.5.22-0ubuntu1

---------------
mysql-5.5 (5.5.22-0ubuntu1) precise; urgency=low

  * New upstream release (LP: #965523)
  * d/mysql-server-5.5.mysql.upstart: Add respawn limit and code
    to exit ping loop if mysqld needs to be respawned. (LP: #711635)
  * New upstream bugfix release.
  * Refreshed patches
  * d/control: Build-Dep on dh-apparmor (LP: #948481)
  * d/p/fix-mysqldump-test.patch: Fixes unreliable test introduced in
    new upstream version. http://bugs.mysql.com/bug.php?id=64401
  * d/control: Re-add empty meta-package for mysql-testsuite so users
    will get the 5.5 version on upgrade and when installing
    mysql-testsuite. (LP: #959683)
 -- Clint Byrum <email address hidden> Thu, 29 Mar 2012 17:31:30 -0700

Changed in mysql-5.5 (Ubuntu Precise):
status: Fix Committed → Fix Released
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mysql-5.5 (Ubuntu Lucid):
status: New → Confirmed
Changed in mysql-5.5 (Ubuntu Natty):
status: New → Confirmed
Changed in mysql-5.5 (Ubuntu Oneiric):
status: New → Confirmed
Changed in mysql-5.5 (Ubuntu Lucid):
status: Confirmed → Invalid
Changed in mysql-5.5 (Ubuntu Natty):
status: Confirmed → Invalid
Changed in mysql-5.5 (Ubuntu Oneiric):
status: Confirmed → Invalid
description: updated
Revision history for this message
Chris Halse Rogers (raof) wrote :

I've rejected the lucid upload here; 5.1.62 has been uploaded by the security team to -proposed, so this will need to be rebased on that.

Steve Langasek (vorlon)
Changed in upstart (Ubuntu):
assignee: nobody → James Hunt (jamesodhunt)
importance: Undecided → Medium
status: New → Triaged
Changed in upstart (Ubuntu Lucid):
status: New → Won't Fix
Steve Langasek (vorlon)
Changed in upstart (Ubuntu Natty):
status: New → Won't Fix
Changed in upstart (Ubuntu Oneiric):
status: New → Won't Fix
Steve Langasek (vorlon)
Changed in upstart (Ubuntu Precise):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
dino99 (9d9) wrote :
Changed in mysql-5.1 (Ubuntu Oneiric):
status: Triaged → Invalid
Changed in mysql-5.1 (Ubuntu Natty):
status: Triaged → Invalid
Revision history for this message
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in mysql-dfsg-5.1 (Ubuntu Lucid):
status: Triaged → Won't Fix
Revision history for this message
Steve Langasek (vorlon) wrote :

The Precise Pangolin has reached end of life, so this bug will not be fixed for that release

Changed in upstart (Ubuntu Precise):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.