jenkins can start in a bad state

Bug #1288947 reported by Francis Ginther
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu CI Engine
Fix Released
Critical
Chris Johnston
Ubuntu CI Services
Fix Released
Critical
Chris Johnston

Bug Description

Found a case where deploy worked, but jenkins was not usable. This caused the lander_jenkins_worker to fail also. Looking at the logs, one thing is of interest during lander_jenkins deployment:

from /var/log/juju/unit-lander-jenkins-sub-0.log:
HOOK The selected http port (8080) seems to be in use by another program

More debugging showed that jenkins wasn't even running.

So we have the following issues:
1) Jenkins may not come up in a usable state, can we check for that and restart it if it's on the wrong port
2) Jenkins may go done, we need to be able to restart it automatically.
3) the lander_jenkins worker does not handle jenkins failure gracefully, this should just lead to a retry.

Tags: airline

Related branches

Revision history for this message
Francis Ginther (fginther) wrote :
tags: added: airline
Revision history for this message
Francis Ginther (fginther) wrote :
Revision history for this message
Chris Johnston (cjohnston) wrote :

I just had this happen.

Changed in ubuntu-ci-services-itself:
status: New → Confirmed
importance: Undecided → Critical
Andy Doan (doanac)
Changed in ubuntu-ci-services-itself:
milestone: none → phase-0
Revision history for this message
Andy Doan (doanac) wrote :

here's some more information i've been collection.

the jenkins charm itself seems to restart jenkins, but after its done you'll see something like:

root@juju-hpcloud-machine-5:~# /etc/init.d/jenkins status
2 instances of jenkins are running at the moment
but the pidfile /var/run/jenkins/jenkins.pid is missing

So at this point we have some rougue jenkins/java process holding on to port 8080.

The lander subordinate charm then runs and uses charmhelpers to restart the service:

2014-03-07 20:37:19 INFO juju-log Updating master jenkins.
2014-03-07 20:37:20 INFO juju-log Restarting jenkins.
2014-03-07 20:37:20 INFO config-changed * Stopping Jenkins Continuous Integration Server jenkins
2014-03-07 20:37:20 INFO config-changed ...done.
2014-03-07 20:37:20 INFO config-changed * Starting Jenkins Continuous Integration Server jenkins
2014-03-07 20:37:20 INFO config-changed The selected http port (8080) seems to be in use by another program
2014-03-07 20:37:20 INFO config-changed Please select another port to use for jenkins
2014-03-07 20:37:20 INFO config-changed ...done.
2014-03-07 20:37:20 INFO config-changed restart: Unknown instance:

The interesting thing at this point is that the "stop" part seems to have actually killed the java process and port 8080 is open right after. Another interesting thing is that I see this in hpcloud but not canonistack.

Given the root problem is with jenkins or it charm, i'm hoping we can deal with this from our charm:

def restart_jenkins(config):
    juju_info('Restarting jenkins.')
    core.host.service_stop('jenkins')
    core.host.service_start('jenkins')

/me wonders if we can't add a time.sleep(10) inbetween the stop/start calls there and maybe the java process will have really closed the port.

Revision history for this message
Andy Doan (doanac) wrote :

Chris had a better idea. This problem doesn't exist with our ci-train jenkins. So we can just revert the change we made a couple of days ago and use that jenkins charm instead. This should prevent dumb hacks like i suggested above.

Changed in ubuntu-ci-services-itself:
status: Confirmed → In Progress
assignee: nobody → Chris Johnston (cjohnston)
Changed in ubuntu-ci-services-itself:
status: In Progress → Fix Committed
Changed in ubuntu-ci-services-itself:
status: Fix Committed → Fix Released
Ursula Junque (ursinha)
Changed in uci-engine:
assignee: nobody → Chris Johnston (cjohnston)
importance: Undecided → Critical
milestone: none → phase-0
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.