juju deploy doesn't always pick the optimal machine
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
To cope with slow VM provisioning on MAAS, I provision many machines ahead of time (`juju add-machine`). This way I don't have to wait too much when deploying units as juju usually picks a machine that's already "deployed".
Sometimes, juju decides to put the newly deployed unit on a machine that is still provisioning (machine 100 here):
$ juju deploy ./lxd_ubuntu-
Located local charm "lxd", revision 35
Deploying "lxd" from local charm "lxd", revision 35
$ juju status
Model Controller Cloud/Region Version SLA Timestamp
test overlord maas/default 2.9.14 unsupported 18:07:05Z
App Version Status Scale Charm Store Channel Rev OS Message
https-client active 1 https-client local 35 ubuntu
lxd waiting 0/1 lxd local 35 ubuntu waiting for machine
Unit Workload Agent Machine Public address Ports Message
https-client/45* active idle 81 2602:fc62:
lxd/26 waiting allocating 100 2602:fc62:
Machine State DNS Inst id Series AZ Message
81 started 2602:fc62:
83 started 2602:fc62:
86 started 2602:fc62:
89 started 2602:fc62:
90 started 2602:fc62:
91 started 2602:fc62:
92 started 2602:fc62:
94 started 2602:fc62:
95 started 2602:fc62:
96 started 2602:fc62:
97 started 2602:fc62:
98 started 2602:fc62:
99 pending 2602:fc62:
100 pending 2602:fc62:
101 pending 2602:fc62:
It should always pick any available machine that's already deployed, not one still deploying. It usually gets this right but not always, or maybe it's random and I am usually lucky ;)
Additional information:
$ juju --version
2.9.15-ubuntu-amd64
Juju's methodology is to set up the desired model, and then work asynchronously to make reality match the model. Juju doesn't block on machines being provisioned when assigning units - it picks a machine which does not yet have anything assigned to it. That machine may well be provisioning still. But when it does become ready the juju agent on the machine will then work to install any units that had been allocated to the machine. Large bundles benefit from this behaviour for example.
Selection of available machines is not deterministic so you may well have been "lucky" previously. Note that Juju doesn't just pick any unused machine - it makes sure that the machine's memory, cpu, disk etc matches any constraints used when deploying the app/unit.
Should Juju prefer fully provisioned machines, all other things being equal? That approach would ensure less chance of a failed deployment - a non-provisioned machine might fail to come up and thus the unit would not get deployed. Whereas if the unit were preferably placed on an unused provisioned machine, at least it would be running even if the other machine failed.