Canonical Juju

Bug #1671588
Comment #5

Comment 5 for bug 1671588

Revision history for this message

Felipe Reyes (freyes) wrote on 2017-08-03:

We found another environment that had several machines duplicated (1 on failed deployment and 1 deployed OK), the workaround used is:

0) Switch to the model where there are duplicated machines
  $ juju switch SOME_MODEL
1) Identify the number of the machine(s) that want to be *removed* (e.g. 33)
2) Set the harvest mode to none
  $ juju model-config provisioner-harvest-mode=none
3) Remove the machine previously identified
  $ juju remove-machine --force MACHINE_NUMBER
4) monitor the progress in juju status and in the maas web ui
  - juju status should stop displaying the machine that failed to deploy in the maas
  - the maas web ui should *not* show any change in the system, at this level juju shouldn't be making any changes.
  - Wait a few minutes before to proceed to the next step to make sure any background task is completed
5) Repeat steps 3 and 4 for each duplicated machine marked as "failed deployment"
6) Restore the default value of the harvester mode
  $ juju model-config --reset provisioner-harvest-mode

By luck we found this during a live session and we asked the operator to NOT remove the machines in failed state, an uninformed user could naively run "juju remove-machine --force X" and end up losing data. Considering there is risk of losing data, I think this bug should be marked as "Critical".