I think there is a bug in upstart here, as start shouldn't hang forever. It should error out pretty early actually. This seems to be a problem with post-start and respawn handling.
I created this job to simulate what mysqld does in this case, which is, start, and fail, while the post-start script loops for 30 seconds pinging mysql before exitting with a return code of 1.
# test-pstart
respawn
script
sleep 2
exit 1
end script
post-start script
sleep 30
exit 1
end script
# EOF
start test-pstart will hang forever.
Adding
respawn limit 3 120
saying only respawn it 3 times in 120 seconds before giving up, produces an immediate failure...
start: Job failed to start
Removing the 'exit 1' of the post-start doesn't change this.. just having the post-start wait 30 seconds seems to produce the error, but only when a respawn limit of any kind is specified. I've even tried respawn limit 10000 10000 .. this still produces the error where bare 'respawn' does not.
If nothing else, this behavior is *confusing*, and a warning in the man page for upstart that respawn may cause your start commands to hang indefinitely without a limit is in order, which is why I've added upstart to this bug report.
For the mysql-5.1 package, we should add a respawn limit to work around the problem. Since this is changing a default behavior, we should be very conservative. Three times in 120 seconds seems that.. as that means in 2 minutes mysql exited with a non-zero exit code. If mysql fails more than that.. it may be necessary for an administrator to step in and fix data/problems.
Marking Triaged in mysql. Agree with the importance of Low, as this only affects people whose mysqld's are severely broken already.
I think there is a bug in upstart here, as start shouldn't hang forever. It should error out pretty early actually. This seems to be a problem with post-start and respawn handling.
I created this job to simulate what mysqld does in this case, which is, start, and fail, while the post-start script loops for 30 seconds pinging mysql before exitting with a return code of 1.
# test-pstart
respawn
script
sleep 2
exit 1
end script
post-start script
sleep 30
exit 1
end script
# EOF
start test-pstart will hang forever.
Adding
respawn limit 3 120
saying only respawn it 3 times in 120 seconds before giving up, produces an immediate failure...
start: Job failed to start
Removing the 'exit 1' of the post-start doesn't change this.. just having the post-start wait 30 seconds seems to produce the error, but only when a respawn limit of any kind is specified. I've even tried respawn limit 10000 10000 .. this still produces the error where bare 'respawn' does not.
If nothing else, this behavior is *confusing*, and a warning in the man page for upstart that respawn may cause your start commands to hang indefinitely without a limit is in order, which is why I've added upstart to this bug report.
For the mysql-5.1 package, we should add a respawn limit to work around the problem. Since this is changing a default behavior, we should be very conservative. Three times in 120 seconds seems that.. as that means in 2 minutes mysql exited with a non-zero exit code. If mysql fails more than that.. it may be necessary for an administrator to step in and fix data/problems.
Marking Triaged in mysql. Agree with the importance of Low, as this only affects people whose mysqld's are severely broken already.