Removing a lost unit (due to a redeployed server that became a new machine), releases the server in MAAS, bringing down the new machine.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Unassigned |
Bug Description
Juju: 2.8.7
MAAS: 2.9.0~rc3
Looks like juju maps different machines to the same identical HW in MAAS, so actions on one machine or units in that machine affect the other with major impact.
Steps to reproduce:
1-Deploy a juju unit U/1 to a new server in MAAS (deploys the server and creates juju machine N1)
2-Server HW fails, all juju agents in N1 become lost in juju status.
3-Server HW is repaired with a new motherboard for example, is re-commissioned without issues in MAAS.
4-Another unit U/2 is deployed in juju, the original machine is now deployed by juju in MAAS (creates a new juju machine N2)
5-Remove the stale units U/1, need --force because otherwise nothing happens.
juju remove-unit --force U/1
Outcome:
* U/1 disappears from juju status, and the server is released in MAAS
* Since the server apparently internally it had the same ID (so internally 2 juju machines had the same ID, one status lost N1, the other working N2) => MAAS shuts down the hardware (potentially wiping storage in it)
* All units in juju machine N2 become lost as well. This can have major impact to services running there, hard to recover, and potentially cause data loss depending on MAAS configuration.
Changed in juju: | |
importance: | Undecided → High |
status: | New → Triaged |
tags: | added: maas-provider remove-unit |
(thinking aloud) once you have two machines in the same model with different ids it's already too late. Could perhaps juju notice the attempt at redeploying an already existing machine under a new id and refuse to proceed?