machines can be half-added and thereby unable to be removed

Bug #1933812 reported by Christian Ehrhardt 
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

Hi,
I was facing an issue around addin/removing machines.
My setup is a loal juju (client) a canonistack juju controller and a canonistack machine that I wanted to add-machine.

I managed to get there and had the machine added:

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
default manual-canonistack-cloud-default manual-canonistack-cloud/default 2.9.5 unsupported 14:26:24+02:00

Machine State DNS Inst id Series AZ Message
0 started 10.48.131.170 manual:10.48.131.170 impish Manually provisioned machine

Then I wanted to deploy a charm and forgot that it is only a subordiante.
$ juju deploy /tmp/charm-builds/ntp/ --to 0
Located local charm "ntp", revision 0
ERROR cannot use --num-units or --to with subordinate application

Fine you'd think, then use it differntly.

I thought to deploy something else that isn't a suboridnate

$ juju deploy ubuntu --to 0
Located charm "ubuntu" in charm-hub, revision 19
Deploying "ubuntu" from charm-hub charm "ubuntu", revision 19 in channel stable
ERROR cannot deploy "ubuntu" to machine 0: machine 0 not found

But the machine was gone.
No ID 0 anymore ??

Well ok let us add it again
$ juju add-machine ssh:ubuntu@10.48.131.170
ERROR machine is already provisioned

Hmm, ok then let us remove it to re-add cleanly
$ juju remove-machine 0
removing machine 0 failed: machine 0 not found
$ juju machines
Machine State DNS Inst id Series AZ Message

So my machine is gone and I can't use it, but I also can't add it.
This is locking me out of everything and all I can do right now is purging all configuration for a retry.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I found this on the target:

4 0 149497 1 20 0 9068 3592 - Ss ? 0:00 bash /etc/systemd/system/jujud-machine-0-exec-start.sh
4 0 149502 149497 20 0 846588 92660 - SLl ? 1:41 \_ /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug

So I cleared things via:
  $ sudo /usr/sbin/remove-juju-services

That allowed me to add it again.

IMHO this is a situation that can be detected and handled much better

I'd ask to:
a) At least offer the user a better error message than "ERROR machine is already provisioned" based on the metadata you found there like
"ERROR machine is already provisioned - for controller X on IP Y, at data Z"
b) it would be very helpful to then offer "do you want to clean and re-add the machine" which would then call remove-juju-services for the user.

Right now I realize that bug 1933819 and this one come down to almost the same root cause, just once for machine-add and once for controller-boostrap. If you want to implement/fix this in one, then feel free to dup the two together.

Ian Booth (wallyworld)
tags: added: manual-provider
Changed in juju:
importance: Undecided → Medium
status: New → Triaged
milestone: none → 2.9-next
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This Medium-priority bug has not been updated in 60 days, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
Harry Pidcock (hpidcock)
Changed in juju:
importance: Low → Medium
milestone: 2.9-next → 3.2-beta1
Changed in juju:
milestone: 3.2-beta1 → 3.2-rc1
Changed in juju:
milestone: 3.2-rc1 → 3.2.0
Changed in juju:
milestone: 3.2.0 → 3.2.1
Changed in juju:
milestone: 3.2.1 → 3.2.2
Changed in juju:
milestone: 3.2.2 → 3.2.3
Changed in juju:
milestone: 3.2.3 → 3.2.4
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.