remove-machine fails to remove machines in rackspace/openstack

Bug #1677425 reported by Christopher Lee
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Heather Lanigan
2.3
Fix Released
High
Andrew Wilkins

Bug Description

As seen at: http://reports.vapour.ws/releases/issue/5613e69e749a56133636dde4

Seems this commit may have introduced the recent failures: https://github.com/juju/juju/commit/7e402e543cce1917c6de1e0abc96c66fe65d28ad

Attempting to remove the machine fails as the machine still appears in status output 240 seconds after the remove command.

Changed in juju:
milestone: none → 2.2-beta3
Changed in juju:
status: Confirmed → In Progress
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

https://github.com/juju/juju/pull/7175

Easy reproducer in the PR.

Ian Booth (wallyworld)
Changed in juju:
milestone: 2.2-beta3 → 2.2-beta2
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Andrew Wilkins (axwalk) wrote :

This is still failing sometimes in 2.3.

I think the issue is that the host machine is taking a long time to provision, and the container sits there in the "dying" state during that time. When the host finally comes up, its machine agent sees the container is dying and proceeds to remove it from state.

We should short-circuit the removal of containers when the host machine hasn't yet been provisioned.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

So that was not quite right. From the machine agent log of the host, I can see that container provisioner is failing because it can't install lxd-client from the archives:

"""
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/l/lxd/lxd-client_2.0.11-0ubuntu1~16.04.2_amd64.deb 404 Not Found [IP: 201:67c:1360:8001::17 80]

E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
"""

Then in cloud-init-output.log:

"""
Err:1 http://security.ubuntu.com/ubuntu xenial-security InRelease
  Temporary failure resolving 'security.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu xenial InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://archive.ubuntu.com/ubuntu xenial-backports InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Reading package lists...
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-backports/InRelease Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/xenial-security/InRelease Temporary failure resolving 'security.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Reading package lists...
"""

I still think we should short-circuit the removal of containers, but the more important issue here is to stop apt flaking out.

Revision history for this message
Andrew Wilkins (axwalk) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.