Juju does not state it is upgrading

Bug #1645478 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
John A Meinel
juju-ci-tools
Incomplete
Undecided
Unassigned

Bug Description

As seen at
    http://reports.vapour.ws/releases/issue/563c3bae749a5650a6256ca4

get_juju_output raises when calling out to the juju client after an upgrade, retries succeed.

This might be a timing case where the controller hiccuped after the upgrade. The call to status reports that all machines and units upgraded.

This is only seen in lxd, manual, and openstack upgrades. maas, aws did not fail.

The root issue is that there is no way to see an upgrade is in progress until you get an error.

Related branches

Curtis Hovey (sinzui)
Changed in juju:
assignee: nobody → Alexis Bruemmer (alexis-bruemmer)
milestone: none → 2.1-beta2
Curtis Hovey (sinzui)
tags: added: openstack-provider
description: updated
Revision history for this message
Anastasia (anastasia-macmood) wrote :

It seems that manual* tests' logs have more information.

I can see before command ends:
23:06:28 DEBUG juju.api monitor.go:35 RPC connection died
23:06:28 DEBUG juju.api monitor.go:35 RPC connection died

Changed in juju:
assignee: Alexis Bruemmer (alexis-bruemmer) → Ian Booth (wallyworld)
Revision history for this message
Ian Booth (wallyworld) wrote :

This simply looks like a case of the CI scripts not waiting long enough for the upgrade to be completed. The controller logs show the upgrade completes at 00:28:33

2016-11-30 00:28:33 INFO juju.upgrade upgrade.go:108 All upgrade steps completed successfully
2016-11-30 00:28:33 INFO juju.worker.upgradesteps worker.go:225 upgrade to 2.1-beta2 completed successfully.
2016-11-30 00:28:33 DEBUG juju.apiserver request_notifier.go:115 <- [3] machine-0 {"request-id":15,"type":"Machiner","version":1,"request":"SetStatus","params":"'params redacted'"}
2016-11-30 00:28:33 DEBUG juju.apiserver request_notifier.go:140 -> [3] machine-0 27.267173ms {"request-id":15,"response":"'body redacted'"} Machiner[""].SetStatus
2016-11-30 00:28:33 DEBUG juju.worker.dependency engine.go:500 "upgrade-steps-flag" manifold worker stopped: gate unlocked

And yet the CI logs show the script giving up 2 seconds earlier:

started upgrade to 2.1-beta2
00:28:20 INFO cmd supercommand.go:465 command finished
2016-11-30 00:28:31 ERROR Command '('juju', '--show-log', 'model-config', '-m', 'prodstack-upgrade-from-2-0-0-xenial-amd64:prodstack-upgrade-from-2-0-0-xenial-amd64', 'agent-metadata-url')' returned non-zero exit status 1

stderr:
00:28:31 INFO juju.cmd supercommand.go:63 running juju [2.1-beta2 gc go1.6.2]
00:28:31 ERROR cmd supercommand.go:458 upgrade in progress (upgrade in progress)

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Ian Booth (wallyworld) wrote :

Further down in the controller logs, it shows the controller starting all the workers after the upgrade but that gets cut short as the controller gets killed by the CI scripts, or so it appears.

Changed in juju:
milestone: 2.1-beta2 → none
Revision history for this message
Aaron Bentley (abentley) wrote :

There has seems to have been a change in juju's behaviour. The frequency of this issue spiked on Nov 25. Before then, was extremely rare.

Our scripts wait for juju to report that its version updated. For example:

started upgrade to 2.1-beta2
06:43:25 DEBUG juju.api monitor.go:35 RPC connection died
06:43:25 DEBUG juju.api monitor.go:35 RPC connection died
06:43:25 INFO cmd supercommand.go:465 command finished
2.0.0: 0 ......................................................................
............................................
2016-12-01 06:43:54 ERROR Command '('juju', '--debug', 'model-config', '-m', 'manual-upgrade-trusty-amd64:manual-upgrade-trusty-amd64', 'agent-metadata-url')' returned non-zero exit status 1

You can see that we waited until machine 0's juju version was 2.1-beta2 (and no longer 2.0.0).

If there is a better way to determine when an upgrade has completed, we would be willing to consider it, but in the meantime, it looks like there has been a timing change where juju is reporting that it has upgraded before it has completed.

Changed in juju:
status: Incomplete → Triaged
Changed in juju-ci-tools:
status: New → Incomplete
Revision history for this message
Aaron Bentley (abentley) wrote :

It's also worth noting that the docs say "juju status" can be used to verify that the upgrade has completed, and this is no longer true.

https://jujucharms.com/docs/2.0/models-upgrade#verifying-the-upgrade

Curtis Hovey (sinzui)
summary: - juju model-config errors after upgrade
+ Juju does not state it is upgrading
Revision history for this message
Curtis Hovey (sinzui) wrote :

We have merged a branch that effectively polls for juju to start working again. The commands status, list-controllers, show-controller work. there is no indication that an upgrade is in progress. Other commands error (A non-graceful exit).

In the past juju status outright failed during an upgrade. By the time status came back, we saw the agents in status showed the correct version. Now juju status does not fail for a long period, it shows the agents have switched, but this is lo longer a valid indication upgrades are complete. The only was I know to learn an upgrade is in progress aft status show the new agent version is to try commands and read error messages for "upgrade in progress".

description: updated
Curtis Hovey (sinzui)
tags: added: ui
Changed in juju:
assignee: Ian Booth (wallyworld) → Horacio Durán (hduran-8)
status: Triaged → In Progress
Revision history for this message
Horacio Durán (hduran-8) wrote :

I proposed https://github.com/juju/juju/pull/6763 as a way to give more visibility into the upgrading process.

Changed in juju:
milestone: none → 2.2.0-alpha1
Changed in juju:
milestone: 2.2-alpha1 → 2.2-beta1
Changed in juju:
assignee: Horacio Durán (hduran-8) → John A Meinel (jameinel)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.