Canonical Juju

Bug #1645422
Comment #8

Comment 8 for bug 1645422

Revision history for this message

Dmitrii Shcherbakov (dmitriis) wrote on 2017-11-08:

I think we just need to define what it means to "provision" better.

Conceptually, I would use the following definition:

provisioning = <matching a machine by constraints & other criteria> + <successfully deploying once and installing a machine agent>

At least for MAAS it is intuitive in my view.

If I have to reconfigure a machine, doing retry-provisioning also makes sense but with the following logic:

1. get a machine ID;
2. a deployment has failed either automatically or via a manual action before machine/unit agents have started;
3. a user has released the machine in MAAS;
4. reconfigured the machine/swapped out hardware etc.
5. a manual retry-provisioning detected that a given ID is no longer allocated and tried to allocate a new ID.

The target idea here would be that one could write an orchestrator/automation to talk to Juju, see if a deployment has failed, check MAAS to determine if we can recover from a failure, retry-provisioning without affecting a Juju model unit-wise or application-wise.

If a node is not suitable it would be marked as broken by an orchestrator in MAAS and a different node would be picked without making remove-machine --force && add-unit steps.