python-jujuclient

Deployer fails because juju thinks it is upgrading

Bug #1460171 reported by Curtis Hovey on 2015-05-29

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
juju-ci-tools	Fix Released	Critical	Curtis Hovey
juju-core	Fix Released	High	Ian Booth	juju-core 1.24-beta6
python-jujuclient	Fix Released	High	Ian Booth

Bug Description

The maas deoloyer job is failing. Deployer is blocked/disconnected because juju says it is upgrading.
http://reports.vapour.ws/releases/2709/job/maas-1_7-deployer/attempt/397

Juju cannot be upgrading, or at least, it is not possible to upgrade in this case because the version in test is 1.24-beta6, which is the newest version in the test streams. We see a download of
https://swift.canonistack.canonical.com/v1/AUTH_526ad877f3e3464589dc1145dfeaac60/juju-dist/testing/tools/releases/juju-1.24-beta6-trusty-amd64.tgz
and their is no greater version in the streams created and confirmed in the logs output.

This regression may be caused by...which would be ironic
Commit 2b71c0d Merge pull request #2441 from wallyworld/tools-upgrade-before-api …

Tags:

Related branches

lp://staging/~wallyworld/python-jujuclient/retry-on-upgrade

Merged into lp://staging/python-jujuclient at revision 59

Adam Collard (community): Needs Fixing on 2015-06-01

Kapil Thangavelu: Approve on 2015-06-01

John A Meinel (community): Approve on 2015-06-01

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-05-30:

The term "upgrade" may mean one of two things:

1. Juju running upgrade steps to upgrade an older environment
2. Juju upgrading agent tools

The message in this bug refers to item 1, but with the change in behaviour of Juju bootstrap, is now poorly worded.
What happens now is:

1. Juju bootstraps and starts the machine agent on the bootstrap node
2. the machine agent delays activating the until:
a. any upgrade steps are run
b. it has determined that no agent upgrades are needed <-- this is new
3. once all upgrade (agent or steps) related tasks are finished, the full api is enabled

So if a deploy is attempted before the full api is enabled, the "upgrade in progress" error is returned.

Before the above change, the deployer would connect immediately after bootstrap and if an implicit upgrade were done, the deployer would be disconnected part way through it's deployment process.

Now what happens is more correct - any attempt to do work with Juju while the state server is not ready is rejected up front, rather than accepting a connection and then disconnecting.

The same response in this bug would happen if the user typed fast and did a deploy immediately after bootstrap - they would be told to try again in a sort time.

Ideally here the deployer would "do the right thing" and retry.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-05-30:

I question whether this is a regression - the same "error" or behaviour would always been possible/likely if the user did:

juju upgrade-juju && juju-deploy

The deployer would receive the message about an upgrade being in progress.

Without the change in behaviour

juju bootstrap && juju-deploy

also failed, but in a way that is less obvious and useful - a unexpected disconnect.

Now the state server doesn't even accept the deploy request in the first place until it is ready to act on it, and instead returns an error which the deployer can retry on.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-05-30:

A solution would be to delay the API, limited or otherwise, availability until after the agent upgrade check. But this would have 2 issues:

1. juju status would be slightly delayed until it started working
2. would still not solve the agent upgrade error that the deployer would get when an agent upgrade is running

The limited API would still be available while the upgrade steps are running, but not until after the agent upgrade check completes.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-05-31:

Interestingly, it doesn't always fail. On AWS:

juju bootstrap --upload-tools && juju --debug deployer --deploy-delay 10 --config ~/landscape-scalable.yaml

works fine.

Another option is to teach the juju-deployer how to retry on upgrade errors. This would be much easier to implement.

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-05-31:

The python-jujuclient code has been modified so that if any RPC call results in an "upgrade in progress" error, then the call will be retried.

This change improves the robustness of the deployer overall.

https://code.launchpad.net/~wallyworld/python-jujuclient/retry-on-upgrade/+merge/260658

Changed in juju-core:
assignee:	nobody → Ian Booth (wallyworld)
status:	Triaged → In Progress
Changed in python-jujuclient:
assignee:	nobody → Ian Booth (wallyworld)
status:	New → In Progress

David Britton (dpb) on 2015-06-01

Changed in python-jujuclient:
status:	In Progress → Fix Committed

David Britton (dpb) on 2015-06-01

Changed in python-jujuclient:
importance:	Undecided → High

Revision history for this message

Curtis Hovey (sinzui) wrote on 2015-06-02:

I changed the deployer test to call client.wait_for_started() after bootstrap. We can see that 17 seconds can pass before the the bootstrapped state-server is ready.

Changed in juju-ci-tools:
assignee:	nobody → Curtis Hovey (sinzui)
importance:	Undecided → Critical
status:	New → Fix Released

Curtis Hovey (sinzui) on 2015-06-02

Changed in juju-core:
importance:	Critical → High

Revision history for this message

Ian Booth (wallyworld) wrote on 2015-06-02:

I added code to juju bootstrap to delay the exit of the bootstrap command until the API is fully available. This will also alleviate the problem, without the need for a delay in deployer.

Changed in juju-core:
status:	In Progress → Fix Committed

Curtis Hovey (sinzui) on 2015-06-02

tags:

added: tech-debt

Curtis Hovey (sinzui) on 2015-06-02

Changed in juju-core:
status:	Fix Committed → Fix Released

Curtis Hovey (sinzui) on 2016-06-23

Changed in python-jujuclient:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.