I think we just need to define what it means to "provision" better.
Conceptually, I would use the following definition:
provisioning = <matching a machine by constraints & other criteria> + <successfully deploying once and installing a machine agent>
At least for MAAS it is intuitive in my view.
If I have to reconfigure a machine, doing retry-provisioning also makes sense but with the following logic:
1. get a machine ID;
2. a deployment has failed either automatically or via a manual action before machine/unit agents have started;
3. a user has released the machine in MAAS;
4. reconfigured the machine/swapped out hardware etc.
5. a manual retry-provisioning detected that a given ID is no longer allocated and tried to allocate a new ID.
The target idea here would be that one could write an orchestrator/automation to talk to Juju, see if a deployment has failed, check MAAS to determine if we can recover from a failure, retry-provisioning without affecting a Juju model unit-wise or application-wise.
If a node is not suitable it would be marked as broken by an orchestrator in MAAS and a different node would be picked without making remove-machine --force && add-unit steps.
I think we just need to define what it means to "provision" better.
Conceptually, I would use the following definition:
provisioning = <matching a machine by constraints & other criteria> + <successfully deploying once and installing a machine agent>
At least for MAAS it is intuitive in my view.
If I have to reconfigure a machine, doing retry-provisioning also makes sense but with the following logic:
1. get a machine ID;
2. a deployment has failed either automatically or via a manual action before machine/unit agents have started;
3. a user has released the machine in MAAS;
4. reconfigured the machine/swapped out hardware etc.
5. a manual retry-provisioning detected that a given ID is no longer allocated and tried to allocate a new ID.
The target idea here would be that one could write an orchestrator/ automation to talk to Juju, see if a deployment has failed, check MAAS to determine if we can recover from a failure, retry-provisioning without affecting a Juju model unit-wise or application-wise.
If a node is not suitable it would be marked as broken by an orchestrator in MAAS and a different node would be picked without making remove-machine --force && add-unit steps.