upgrades are broken in master 1.24-alpha1

Bug #1434070 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Tim Penhey

Bug Description

All 14 upgrade jobs failed testing master 1.24-alpha1 commit f1dc51a.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached is the machine-2 log that might have a clue about the problem

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached is the all machine log with the credentials remove.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I think the issue here is actually machine 0 not being able to reconnect to the API server after the upgrade, most likely due to a recent change that forces clients using LoginV2 and empty envUUID (e.g. connecting at the root path, not /environments/<uuid>) into a restricted API root with only EnvironmentManager and UserManager facades. I can see the machine agent for 0 logs in OK, but then gets an error when trying to call Agent.GetEntities via the API.

Tim Penhey (thumper)
Changed in juju-core:
assignee: nobody → Tim Penhey (thumper)
Tim Penhey (thumper)
Changed in juju-core:
status: Triaged → In Progress
Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Revision history for this message
Tim Penhey (thumper) wrote :

Yes indeed that was the case. The agents didn't have an environment UUID saved in the agent config, so they were trying to connect to the API at the root with a modern login version (i.e. Version 2). This only serves the UserManager and EnvironmentManager facades.

This is apparent from the Login request using version 2, and showing that the Agent object isn't there because there is no environment.

The upgrades still appear to be failing, but since I have confirmed this works locally going from 1.21.3 -> 1.24-alpha1, I'd like to see the log files. None of the build jobs have an accurate machine-0.log.

Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → In Progress
Revision history for this message
Curtis Hovey (sinzui) wrote :

This is still broken. CI also upgraded to stable 1.22.0.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached is the machine-0 log from the aws-upgrade-trusty-amd64 taken from
    http://reports.vapour.ws/releases/2465

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached is the all-machines log from the aws-upgrade-trusty-amd64 taken from
     http://reports.vapour.ws/releases/2465

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Looking at the machine-0.log there's some issue with the generated TLS certificate:

2015-03-20 05:38:57 INFO juju.worker runner.go:261 start "api"
2015-03-20 05:38:57 INFO juju.api apiclient.go:327 dialing "wss://localhost:17070/"
2015-03-20 05:38:57 INFO juju.api apiclient.go:335 error dialing "wss://localhost:17070/": websocket.Dial wss://localhost:17070/: dial tcp 127.0.0.1:17070: connection refused
2015-03-20 05:38:57 ERROR juju.worker runner.go:219 exited "api": unable to connect to "wss://localhost:17070/"
2015-03-20 05:38:57 INFO juju.worker runner.go:253 restarting "api" in 3s
2015-03-20 05:38:58 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, not juju-mongodb

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

I suspect being more lenient in this test here https://github.com/juju/juju/pull/1858/files might have masked the issue a bit.

Revision history for this message
Curtis Hovey (sinzui) wrote :

I am closing this bug. Tim's branch may have fixed the 1.21.3 upgrade cases. Ci has switched to 1.22.0 and it cannot upgrade to 1.23-beta1 which has fine revisions from earlier this week nor can it upgrade to 1.24-alpha1.

Changed in juju-core:
status: In Progress → Fix Released
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

Regarding the test changes in https://github.com/juju/juju/pull/1858 ... this change has nothing to do with the problem here. The problem being fixed in 1858 has always been there. There was always a small chance that TestLoginsDuringUpgrade could fail like that, ever since it was written. That's purely a timing issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.