Maas with http_proxy, Juju 1.24.6 -> 1.24.7, upgrade never finishes

Bug #1509097 reported by Ramesh Isaac
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
falkor
Fix Released
High
David Britton
juju-core
Fix Released
High
Ian Booth
1.24
Won't Fix
High
Unassigned
1.25
Won't Fix
High
Unassigned

Bug Description

2015-10-22 17:11:07,803 DEBUG CMD: ['juju', 'bootstrap', '-e', 'datacenter-maas', '--to', 'vmLDS.genpod2']
2015-10-22 17:17:58,493 DEBUG STDERR:
Bootstrapping environment "datacenter-maas"
Starting new instance for initial state server
Launching instance
 - /MAAS/api/1.0/nodes/node-efb78b56-78fd-11e5-96b7-70ca9b03c908/
Installing Juju agent on bootstrap instance
Waiting for address
Attempting to connect to vmLDS.genpod2:22
Attempting to connect to vmLDS.genpod2:22
Attempting to connect to 10.20.0.11:22
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
b6:5d:00:9d:f6:4c:ad:42:3e:6e:7c:00:79:43:28:69.
Please contact your system administrator.
Add correct host key in /home/localadmin/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /home/localadmin/.ssh/known_hosts:1
  remove with: ssh-keygen -f "/home/localadmin/.ssh/known_hosts" -R 10.20.0.11
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Running apt-get upgrade
Installing package: curl
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: rsyslog-gnutls
Installing package: cloud-utils
Installing package: cloud-image-utils
Installing package: tmux
Fetching tools: curl -sSfw 'tools from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' --retry 10 -o $bin/tools.tar.gz <[https://streams.canonical.com/juju/tools/releases/juju-1.24.6-trusty-amd64.tgz]>
Bootstrapping Juju machine agent
Starting Juju machine agent (jujud-machine-0)
Bootstrap agent installed
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
ERROR upgrade in progress - Juju functionality is limited

2015-10-22 17:17:58,494 WARNING Hit error with juju bootstrap: Command '['juju', 'bootstrap', '-e', 'datacenter-maas', '--to', 'vmLDS.genpod2']' returned non-zero exit status 1
2

Tags: landscape
Revision history for this message
David Britton (dpb) wrote :

machine-0.log from the stuck upgrade

Revision history for this message
David Britton (dpb) wrote :
summary: - juju-deployer fails deployment of bundle
+ Juju 1.24.6 -> 1.24.7, upgrade never finishes
description: updated
information type: Proprietary → Public
affects: falkor → juju-core
tags: added: kanban-cross-team landscape
Revision history for this message
David Britton (dpb) wrote : Re: Juju 1.24.6 -> 1.24.7, upgrade never finishes

If this happens, make sure your juju client is upgraded

  sudo apt-get update; sudo apt-get dist-upgrade

... don't rely on the "upgrade in progress" that never finishes.

Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Martin Packman (gz) wrote :

Looks like it boils down to this:

2015-10-22 21:51:08 INFO juju.worker.upgrader upgrader.go:147 desired tool version: 1.24.7
...
2015-10-22 21:51:08 DEBUG juju.environs.tools urls.go:108 trying datasource "keystone catalog"
...
2015-10-22 21:59:37 ERROR juju.worker runner.go:223 exited "upgrader": no tools available

Did the 1.24.7 tools get put in the openstack deployment's catalog before the upgrade was triggered? Not sure exactly how this is managed in these deployments, but either the machines need to be able to get out to streams.canonical.com or the internal mirroring need to be up to date.

The wrapping script reporting this on bootstrap is confusing, looks like in fact the bootstrap to old version succeeded, but presumably bootstrap to 1.24.7 would have failed with no tools found.

tags: removed: kanban-cross-team
Revision history for this message
Ramesh Isaac (rci) wrote :

This failed on retry, is this something different, or something that needs to be cleaned up before trying again?

2015-10-22 18:36:34,982 DEBUG Bootstrapping juju to vmLDS
2015-10-22 18:36:34,982 DEBUG CMD: ['juju', 'bootstrap', '-e', 'datacenter-maas', '--to', 'vmLDS.genpod2']
2015-10-22 18:51:30,902 DEBUG STDERR:
Bootstrapping environment "datacenter-maas"
Starting new instance for initial state server
Launching instance
 - /MAAS/api/1.0/nodes/node-e3b8cc64-7909-11e5-abc6-70ca9b03c908/
ERROR failed to bootstrap environment: bootstrap instance started but did not change to Deployed state: instance "/MAAS/api/1.0/nodes/node-e3b8cc64-7909-11e5-abc6-70ca9b03c908/" is started but not deployed

2015-10-22 18:51:30,902 WARNING Hit error with juju bootstrap: Command '['juju', 'bootstrap', '-e', 'datacenter-maas', '--to', 'vmLDS.genpod2']' returned non-zero exit status 1
2015-10-22 18:51:30,903 DEBUG CMD: ['juju', 'status', '-e', 'datacenter-maas']
2015-10-22 19:01:34,734 DEBUG STDERR:
ERROR Unable to connect to environment "datacenter-maas".
Please check your credentials or use 'juju bootstrap' to create a new environment.

Error details:
unable to connect to "wss://10.20.0.11:17070/environment/1f755837-233d-4933-89c4-52257b0c9f10/api"

2015-10-22 19:01:34,736 DEBUG Checking juju status: [FAIL]

Attaching log from this last run, deploy.cfg has not changed.

Thanks,

-Ramesh

Revision history for this message
David Britton (dpb) wrote :

Ramesh, on comment #7, you can try

   rm -rf ~/.config/falkor

And try again.

If you find this works, please file a separate bug.

Thanks!

summary: - Juju 1.24.6 -> 1.24.7, upgrade never finishes
+ Maas with http_proxy, Juju 1.24.6 -> 1.24.7, upgrade never finishes
Revision history for this message
David Britton (dpb) wrote :

FYI, just to head it off, --upgrade-tools was not used. just regular 'juju bootstrap' (the command is in the firstline of the bug description).

Revision history for this message
Ian Booth (wallyworld) wrote :

So the issue here is that the client used to bootstrap is probably not behind a proxy and can see streams.canonical.com. But the MAAS environment itself can't.

What happens at bootstrap if --no-auto-upgrade is NOT specified:

1. juju bootstrap command run on user's PC checks for latest tools versions

2. if a more recent tools version is found than the client, this is recorded

3. when the new state server comes up, it sees if the bootstrap has recorded a newer tools version

4. bootstrap machine attempts to get these new tools

5. if bootstrap machine cannot fetch the tools, the upgrade worker will fail and just keep restarting

6. the "waiting for API message" is printed forever because the upgrade worker never finishes

So, there's an issue here - the client used to check for tools availability has different internet access to the actual bootstrap machine which is behind a proxy.

We can look to solve this by deferring the tools check to the bootstrap machine, not the client. We should also seriously consider how the upgrade worker behaves when no tools can be found.

For now, the best way is to work around the issue is to use the --no-auto-upgrade option when bootstrapping to avoid the tools check altogether.

Andrew Wilkins (axwalk)
Changed in juju-core:
milestone: none → 1.24.8
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Triaged
assignee: Andrew Wilkins (axwalk) → nobody
milestone: 1.24.8 → none
Revision history for this message
Ian Booth (wallyworld) wrote :

What I recommend here is that:

1. the immediate issue can be solved for 1.24 and 1.25 by ensuring the --no-auto-upgrade arg is used

2. we will improve the behaviour of how auto upgrades are done for 1.26

3. in Juju 2.0, we will not auto upgrade by default

Changed in juju-core:
importance: Critical → High
milestone: none → 1.26-alpha1
Revision history for this message
David Britton (dpb) wrote : Re: [Bug 1509097] Re: Maas with http_proxy, Juju 1.24.6 -> 1.24.7, upgrade never finishes

Hi Ian --

dpb@helo:bin[0]$ juju bootstrap -v --no-auto-upgrade
error: flag provided but not defined: --no-auto-upgrade
dpb@helo:bin[0]$ juju --version
1.24.7-vivid-amd64

I know about the trick of modifying the environment already which is what
we do in other situations, maybe we need to do that here for the dirty
workaround...

  agent-version: $(juju --version)

On Thu, Oct 22, 2015 at 8:56 PM, Ian Booth <email address hidden> wrote:

> What I recommend here is that:
>
> 1. the immediate issue can be solved for 1.24 and 1.25 by ensuring the
> --no-auto-upgrade arg is used
>
> 2. we will improve the behaviour of how auto upgrades are done for 1.26
>
> 3. in Juju 2.0, we will not auto upgrade by default
>
> --
> You received this bug notification because you are a member of
> Landscape, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1509097
>
> Title:
> Maas with http_proxy, Juju 1.24.6 -> 1.24.7, upgrade never finishes
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju-core/+bug/1509097/+subscriptions
>

--
David Britton <email address hidden>

David Britton (dpb)
Changed in falkor:
status: New → In Progress
importance: Undecided → High
assignee: nobody → David Britton (davidpbritton)
Revision history for this message
Ian Booth (wallyworld) wrote :

Hey David

Damn. 1.24 does not have no-auto-upgrade.
And I don't think setting agent-version will work. The code seems to override that with the latest tools version it finds.

It could be the only work around is to ensure the latest 1.24.7 tools are available in the tools cache behind the proxy.

Ian Booth (wallyworld)
Changed in juju-core:
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
Revision history for this message
Ramesh Isaac (rci) wrote :

Hi,

I have removed ~/.config/falkor and am re-trying the install.

Regarding comment #10, the bootstrap is behind a proxy, and I don't really understand the comments about something running on my PC?

Regards,

-Ramesh

Revision history for this message
Ramesh Isaac (rci) wrote :

Re-attempt seems to be failing with some kind of juju registration issue?

2015-10-23 11:19:27,030 DEBUG found 196 children
2015-10-23 11:19:27,031 DEBUG Add Juju machine for 'gp2-prod3.genpod2': delivered
2015-10-23 11:19:27,031 DEBUG Add Juju machine for 'gp2-prod1.genpod2': delivered
2015-10-23 11:19:27,032 DEBUG Add Juju machine for 'gp2-cont3.genpod2': delivered
2015-10-23 11:19:27,032 DEBUG Add Juju machine for 'gp2-cont2.genpod2': delivered
2015-10-23 11:19:27,032 DEBUG Activity: id:1 children:[(u'delivered', 4), (u'waiting', 112), (u'succeeded', 80)]
2015-10-23 11:19:27,033 DEBUG Waiting for cloud-install completion. (102m32s)

Attaching the log file for this attempt, deploy.cfg has not changed.

Thanks,

-Ramesh

Changed in falkor:
status: In Progress → Fix Committed
Revision history for this message
David Britton (dpb) wrote :

On Fri, Oct 23, 2015 at 03:22:38PM -0000, Ramesh Isaac wrote:
> Re-attempt seems to be failing with some kind of juju registration
> issue?

Hi Ramesh --

Please start a new bug if you notice something wrong. We already have
the original issue identified.

Thanks!

--
David Britton <email address hidden>

Revision history for this message
Ian Booth (wallyworld) wrote :

Marked as won't fix for 1.24 as issue can be solved by ensuring recent tools are available to MAAS.

Revision history for this message
Ian Booth (wallyworld) wrote :
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Is --no-auto-upgrade something we should be backporting to 1.24?

Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
David Britton (dpb)
Changed in falkor:
milestone: none → 0.13
milestone: 0.13 → 0.11.2
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
Revision history for this message
David Britton (dpb) wrote :

Fixed in 0.13 as well, just so it's clear.

Changed in falkor:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.