failed add-machine ssh: leaves behind garbage in state

Bug #1356886 reported by Kapil Thangavelu
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins

Bug Description

debugging some other issues with manual provider and ssh (lack of use of ssh-agent on add-machine even though its used in bootstrap).. i came across a bunch of garbage being left behind in state even though the command failed. any reason juju can't clean up after itself when it knows the add-machine failed?

$ juju status
environment: ocean
machines:
  "0":
    agent-state: started
    agent-version: 1.21-alpha1.1
    dns-name: 192.241.202.97
    instance-id: 'manual:'
    series: trusty
    hardware: arch=amd64 cpu-cores=2 mem=3953M
    state-server-member-status: has-vote
  "1":
    agent-state: pending
    dns-name: 192.241.217.243
    instance-id: manual:192.241.217.243
    series: trusty
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "2":
    agent-state: pending
    dns-name: 162.243.153.228
    instance-id: manual:162.243.153.228
    series: trusty
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "3":
    agent-state: pending
    dns-name: 107.170.196.104
    instance-id: manual:107.170.196.104
    series: trusty
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "4":
    agent-state: pending
    dns-name: 192.241.201.218
    instance-id: manual:192.241.201.218
    series: precise
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "5":
    agent-state: pending
    dns-name: 192.241.236.187
    instance-id: manual:192.241.236.187
    series: precise
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "6":
    agent-state: pending
    dns-name: 192.241.213.226
    instance-id: manual:192.241.213.226
    series: precise
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "7":
    agent-state: pending
    dns-name: 192.241.201.218
    instance-id: manual:192.241.201.218
    series: precise
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "8":
    agent-state: pending
    dns-name: 192.241.217.243
    instance-id: manual:192.241.217.243
    series: precise
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "9":
    agent-state: pending
    dns-name: 192.241.236.187
    instance-id: manual:192.241.236.187
    series: precise
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "10":
    agent-state: pending
    dns-name: 192.241.217.243
    instance-id: manual:192.241.217.243
    series: precise
    hardware: arch=amd64 cpu-cores=2 mem=2001M
  "11":
    agent-state: pending
    dns-name: 192.241.236.187
    instance-id: manual:192.241.236.187
    series: precise
    hardware: arch=amd64 cpu-cores=2 mem=2001M
services: {}

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

also unfortunate.. juju will deploy units to these ghost machines.. destroy-machine by itself won't work on them.. a user has to *know to use --force

tags: added: manual-provider
Revision history for this message
Curtis Hovey (sinzui) wrote :

This behaviour is in 1.18.x and 1.20.x. Juju CI experiences this problem when provisioning machines on restricted networks.
The Juju QA team use
    http://bazaar.launchpad.net/~juju-qa/juju-ci-tools/trunk/view/head:/add-remote-machine.bash
to look and abort before adding a machine that juju wont properly remove.

Changed in juju-core:
status: New → Triaged
tags: added: add-machine
Changed in juju-core:
importance: Undecided → High
milestone: none → next-stable
Ian Booth (wallyworld)
tags: added: 14.10
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I'll change the manual code to do the equivalent of "destroy-machine --force" if it fails. It's not perfect, as that just schedules a cleanup. I think we should also change the server-side of that to immediately remove from state if it's possible (i.e. no containers, no units assigned).

Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
milestone: next-stable → 1.21-alpha2
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I've landed the change to force-destroy machines on failure.

Ian Booth (wallyworld)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.