cannot remove or destroy machine in pending state

Bug #1607971 reported by Larry Michel
34
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

I am not able to get rid of a machine that's in pending state. This is similar to this bug 1089291 and you can read there the logic of why we'd want that. Does that mean I can no longer deploy to that system with the same controller? Or if I can, do I end up with duplicate machines?

I can remove the machine that are started but for the pending machine, the command keeps returning but there's nothing happening.

I can't seem to find a kill-machine or some other option to force destroy that machine.

jenkins@s9-lmic-trusty:~$ juju status
MODEL CONTROLLER CLOUD/REGION VERSION
default vmwarecontroller larry 2.0-beta13

APP VERSION STATUS EXPOSED ORIGIN CHARM REV OS

UNIT WORKLOAD AGENT MACHINE PORTS PUBLIC-ADDRESS MESSAGE

MACHINE STATE DNS INS-ID SERIES AZ
0 pending 10.245.0.201 4y3hew trusty Production

jenkins@s9-lmic-trusty:~$ juju status
MODEL CONTROLLER CLOUD/REGION VERSION
default vmwarecontroller larry 2.0-beta13

APP VERSION STATUS EXPOSED ORIGIN CHARM REV OS

UNIT WORKLOAD AGENT MACHINE PORTS PUBLIC-ADDRESS MESSAGE

MACHINE STATE DNS INS-ID SERIES AZ
0 pending 10.245.0.201 4y3hew trusty Production

jenkins@s9-lmic-trusty:~$ juju remove-machine 0
jenkins@s9-lmic-trusty:~$ juju remove-machine 1
ERROR no machines were destroyed: machine 1 does not exist

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0-beta14
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta14 → 2.0-beta15
Changed in juju-core:
milestone: 2.0-beta15 → 2.0.0
affects: juju-core → juju
Changed in juju:
milestone: 2.0.0 → none
milestone: none → 2.0.0
Revision history for this message
Alexis Bruemmer (alexis-bruemmer) wrote :

Larry, are you still seeing this issue? I am not able to reproduce locally with rc1.

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Larry Michel (lmic) wrote :

Alexis, yes I am still seeing this with rc1:

enkins@lmic-s9-instance:~$ juju status machine 0
MODEL CONTROLLER CLOUD/REGION VERSION
default vspherecontroller-beta18 vsphere/dc0 2.0-rc1

APP VERSION STATUS SCALE CHARM STORE REV OS NOTES

UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE

MACHINE STATE DNS INS-ID SERIES AZ
0 pending pending xenial

jenkins@lmic-s9-instance:~$ juju remove-machine 0
jenkins@lmic-s9-instance:~$ sleep 20
jenkins@lmic-s9-instance:~$ juju status machine 0
MODEL CONTROLLER CLOUD/REGION VERSION
default vspherecontroller-beta18 vsphere/dc0 2.0-rc1

APP VERSION STATUS SCALE CHARM STORE REV OS NOTES

UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE

MACHINE STATE DNS INS-ID SERIES AZ
0 pending pending xenial

jenkins@lmic-s9-instance:~$

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

I believe that I'm hitting this error as well. During charm development we end up with a charm in a pending state and are unable to remove it. In various states I'm unable to remove-unit, remove-model, remove-machine nor remove-application...

heather@mitaka:~$ juju status
MODEL CONTROLLER CLOUD/REGION VERSION
test minnow localhost/localhost 2.0-rc1

APP VERSION STATUS SCALE CHARM STORE REV OS NOTES
trove waiting 0/2 trove local 0 ubuntu

UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE
trove/0 unknown lost 0 10.172.45.133 agent lost, see 'juju show-status-log trove/0'
trove/1 waiting allocating 1 waiting for machine

MACHINE STATE DNS INS-ID SERIES AZ
0 down 10.172.45.133 juju-aec564-0 xenial
1 pending pending xenial

RELATION PROVIDES CONSUMES TYPE
cluster trove trove peer

heather@mitaka:~$ juju remove-machine 1
heather@mitaka:~$ sleep 30
heather@mitaka:~$ juju status
MODEL CONTROLLER CLOUD/REGION VERSION
test minnow localhost/localhost 2.0-rc1

APP VERSION STATUS SCALE CHARM STORE REV OS NOTES
trove waiting 0/2 trove local 0 ubuntu

UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE
trove/0 unknown lost 0 10.172.45.133 agent lost, see 'juju show-status-log trove/0'
trove/1 waiting allocating 1 waiting for machine

MACHINE STATE DNS INS-ID SERIES AZ
0 down 10.172.45.133 juju-aec564-0 xenial
1 pending pending xenial

RELATION PROVIDES CONSUMES TYPE
cluster trove trove peer

Machine 0 is down because I tried to remove the lxc machine, which happened, but juju still won't
let it go.

This can be reliably reproduced with the trove charm: https://github.com/TransCirrus/charm-trove
Please note the charm is under development so that may change in time once some of the errors are resolved.

Larry Michel (lmic)
Changed in juju:
status: Incomplete → New
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-rc3 → 2.0.0
Revision history for this message
Larry Michel (lmic) wrote :
Download full text (11.3 KiB)

I am able to recreate with rc2.

I had following scenario:

jenkins@lmic-s9-instance:~$ juju status
MODEL CONTROLLER CLOUD/REGION VERSION
default mycontroller2 larry2 2.0-rc2

APP VERSION STATUS SCALE CHARM STORE REV OS NOTES
cinder active 1 cinder jujucharms 255 ubuntu
glance active 1 glance jujucharms 251 ubuntu
keystone active 1 keystone jujucharms 256 ubuntu
mysql active 1 percona-cluster jujucharms 244 ubuntu
neutron-api active 1 neutron-api jujucharms 244 ubuntu
neutron-gateway active 1 neutron-gateway jujucharms 230 ubuntu
nova-cloud-controller active 1 nova-cloud-controller jujucharms 290 ubuntu
nova-hyperv waiting 1/2 nova-hyperv jujucharms 24 windows
openstack-dashboard active 1 openstack-dashboard jujucharms 241 ubuntu
rabbitmq-server active 1 rabbitmq-server jujucharms 50 ubuntu
swift-proxy active 1 swift-proxy jujucharms 54 ubuntu
swift-storage active 1 swift-storage jujucharms 230 ubuntu

UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE
cinder/0 active idle 3 10.244.241.93 8776/tcp Unit is ready
glance/0 active idle 4/lxd/0 10.244.241.64 9292/tcp Unit is ready
keystone/0 active idle 3/lxd/0 10.244.241.63 5000/tcp Unit is ready
mysql/0 active idle 5/lxd/0 10.244.241.133 Unit is ready
neutron-api/0 active idle 2/lxd/0 10.244.241.73 9696/tcp Unit is ready
neutron-gateway/0 active idle 5 10.244.241.72 Unit is ready
nova-cloud-controller/0 active idle 2 10.244.241.70 8774/tcp Unit is ready
nova-hyperv/0 waiting allocating 0 10.244.241.0 waiting for machine
nova-hyperv/1 a...

Changed in juju:
status: New → Triaged
Revision history for this message
Larry Michel (lmic) wrote :

I'm able to recreate with 2.0.0~20161011~4472~60da3f0-20161011+4472+60da3f0~14.04.

Changed in juju:
milestone: 2.0.0 → 2.0.1
Changed in juju:
status: Triaged → Invalid
Revision history for this message
Larry Michel (lmic) wrote :

@Anastasia, this is marked as invalid but I don't see any comment as to why the status is changed to that. Is this the correct status?

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Larry,
Oh shoot, I've read your comment as "not able to recreate" \o/ One of these days...
M reverting it's status - thank you for following up!

Changed in juju:
status: Invalid → Triaged
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0.1 → none
Revision history for this message
Spyderdyne (spyderdyne) wrote :
Download full text (5.9 KiB)

I have the same issue on: 2.2-beta1-xenial-amd64

juju remove-machine <machine>
juju remove-unit <unit>

Machines are in various states (some are even deleted on LXC) and the commands succeed, but the units/machines are still listed.

root@ayana-angel:~# juju status
Model Controller Cloud/Region Version
ayana-angel juju-lxd-0 ayana-angel 2.1-beta4

App Version Status Scale Charm Store Rev OS Notes
assaultcube unknown 0/1 assaultcube jujucharms 4 ubuntu exposed
minecraft active 0/1 minecraft jujucharms 3 ubuntu exposed
plex active 0/1 plex jujucharms 0 ubuntu exposed
ubuntu active 0/1 ubuntu jujucharms 10 ubuntu exposed
ubuntu-repository-cache waiting 0 ubuntu-repository-cache jujucharms 20 ubuntu

Unit Workload Agent Machine Public address Ports Message
assaultcube/0 unknown lost 17 2603:3001:3301:65f0:216:3eff:fe5f:ab55 28763/udp,28764/udp agent lost, see 'juju show-status-log assaultcube/0'
minecraft/0* unknown lost 2 2603:3001:3301:65f0:216:3eff:fe69:96e0 25565/tcp agent lost, see 'juju show-status-log minecraft/0'
plex/0 unknown lost 12 2603:3001:3301:65f0:216:3eff:fe3e:509a 32400/tcp agent lost, see 'juju show-status-log plex/0'
ubuntu/1 unknown lost 16 2603:3001:3301:65f0:216:3eff:fe5c:ca70 agent lost, see 'juju show-status-log ubuntu/1'

Machine State DNS Inst id Series AZ Message
2 down 2603:3001:3301:65f0:216:3eff:fe69:96e0 juju-df482e-2 trusty Running
12 down 2603:3001:3301:65f0:216:3eff:fe3e:509a juju-df482e-12 trusty Running
16 down 2603:3001:3301:65f0:216:3eff:fe5c:ca70 juju-df482e-16 xenial Running
17 down 2603:3001:3301:65f0:216:3eff:fe5f:ab55 juju-df482e-17 precise Stopped

Relation Provides Consumes Type
cluster ubuntu-repository-cache ubuntu-repository-cache peer

root@ayana-angel:~# juju remove-unit minecraft/0
root@ayana-angel:~# juju remove-unit assaultcube/0

root@ayana-angel:~# juju status
Model Controller Cloud/Region Version
ayana-angel juju-lxd-0 ayana-angel 2.1-beta4

App Version Status Scale Charm Store Rev OS Notes
assaultcube unknown 0/1 assaultcube jujucharms 4 ubuntu exposed
minecraft active 0/1 minecraft jujucharms 3 ubuntu exposed
plex active 0/1 plex jujucharms 0 ubuntu exposed
ubuntu active 0/1 ubuntu jujucharms 10 ubuntu exposed
ubuntu-repository-cache waiting 0 ubuntu-repository-cache jujucharms 20 ubuntu

Unit ...

Read more...

no longer affects: ubuntu
Revision history for this message
Nick Moffitt (nick-moffitt) wrote :

I'm seeing the same thing with units in error state in juju 2.1.1

Changed in juju:
milestone: none → 2.2-rc1
Changed in juju:
milestone: 2.2-beta4 → 2.2-rc1
Revision history for this message
Tim Penhey (thumper) wrote :

Removing the milestone as we have no immediate resource to fix this, and just pushing to the next release all the time seems wrong.

Will add a milestone again when we have someone starting on it.

summary: - [juju 2.0 ] cannot remove or destroy machine in pending state
+ cannot remove or destroy machine in pending state
Changed in juju:
importance: High → Medium
milestone: 2.2-rc1 → none
tags: added: provisioner usability
Ian Booth (wallyworld)
tags: added: teardown
Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

I'm having problems removing machines in pending state still which blocks me from even get new machines in to my model.

```
Model Controller Cloud/Region Version SLA Timestamp
t2 aws-controller aws/eu-west-1 2.5.1 unsupported 20:23:54+01:00

App Version Status Scale Charm Store Rev OS Notes
nfs active 1 nfs jujucharms 9 ubuntu
nfs-client active 2 nfs-client local 2 ubuntu
noop-centos waiting 0 noop-centos jujucharms 0 centos
tiny-bash waiting 0/1 tiny-bash jujucharms 4 ubuntu
tiny-python active 1 tiny-python jujucharms 6 ubuntu
ubuntu 18.04 active 1 ubuntu jujucharms 12 ubuntu

Unit Workload Agent Machine Public address Ports Message
nfs/0* active idle 0 34.242.252.38 NFS ready
tiny-bash/1 waiting allocating 5 waiting for machine
tiny-python/0* active idle 3 52.31.233.76 update-status ran: 19:20
  nfs-client/2 active idle 52.31.233.76 172.31.43.207:/srv/data/nfs-client -> /tmp/data
ubuntu/0* active idle 1 34.252.33.1 ready
  nfs-client/0* active idle 34.252.33.1 172.31.43.207:/srv/data/nfs-client -> /tmp/data

Machine State DNS Inst id Series AZ Message
0 started 34.242.252.38 i-0a1dab0183a67a38e bionic eu-west-1a running
1 started 34.252.33.1 i-0074c048129cc0b2b bionic eu-west-1b running
2 stopped 34.245.215.173 i-017f25b8940e2ab45 trusty eu-west-1c running
3 started 52.31.233.76 i-044188aec4114cf6d xenial eu-west-1a running
4 pending pending centos7 failed to start machine 4 in zone "eu-west-1b"
, retrying in 10s with new availability zone: cannot run instances:
The provided credentials could not be validated and
may not be authorized to carry out the request.
Ensure that your account is authorized to use the Amazon EC2 service and
that you are using the correct access keys.
These keys are obtained via the "Security Credentials"
page in the AWS console.
: Not authorized for images: [ami-7abd0209] (AuthFailure)
5 pending pending trusty
```

Above, the machine 4 is blocking a deploy of machine 5 which is stuck in pending.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

I think that this is a duplicate of bug # 1814271. I'll mark it as such.

As per my comment in bug # 1814271, as of Juju 2.6, I can force remove
machine in error state (status pending) as well as destroy model with a machine in an
error state.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.