In reading the upstart code, specifically in init/job_process.c in the job_process_terminated() function, I believe the problem is the
assumption that there is nothing we need to do when the post-start exits with a non-zero status. In fact, I think this should mark the
job as failed, *even* if there is a respawn stanza, as a post-start delays the 'started' event, and so the job is not yet in the status where the job writer has verified that it has reached a state of working.
Perhaps when post-start exits with non-zero, the job goal should be changed to stop?
In the mean time, as a workaround, if a post-start detects problems with the main process, it should most likely *stop* the job itself. In the case of mysql, this would simply mean running 'stop' if mysql fails to start within 30 seconds.
In reading the upstart code, specifically in init/job_process.c in the job_process_ terminated( ) function, I believe the problem is the
assumption that there is nothing we need to do when the post-start exits with a non-zero status. In fact, I think this should mark the
job as failed, *even* if there is a respawn stanza, as a post-start delays the 'started' event, and so the job is not yet in the status where the job writer has verified that it has reached a state of working.
Perhaps when post-start exits with non-zero, the job goal should be changed to stop?
In the mean time, as a workaround, if a post-start detects problems with the main process, it should most likely *stop* the job itself. In the case of mysql, this would simply mean running 'stop' if mysql fails to start within 30 seconds.