juju tries to acquire machines in specific zones even when no zone placement directive is specified
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned | ||
MAAS |
Invalid
|
Undecided
|
Unassigned |
Bug Description
This causes MAAS to create new machines in the zone requested, rather than finding a machine outside of that zone that already exists. See the comments from bug 1706196.
When you tell juju to deploy, or bootstrap, anything to acquire a machine, it will ask MAAS for a list of zones, then start asking MAAS for a machine, zone by zone, with the expectation that it if one zone doesn't have a machine, MAAS will say no machines available and juju will go to the next zone and try it.
That worked fine (albiet ineffeciently) until MAAS added pod support. Now, instead of saying saying no machines are available in that zone, MAAS will create a new machine and return it to juju.
Perhaps juju should not include a zone constraint when acquiring machines, when no zone constraint has been supplied by the user.
This is with juju 2.2.2.
Changed in juju: | |
milestone: | none → 2.2.3 |
status: | New → Triaged |
importance: | Undecided → High |
tags: | added: foundations-engine |
Changed in juju: | |
milestone: | 2.2.3 → 2.3.0 |
Changed in maas: | |
status: | New → Invalid |
tags: | removed: foundations-engine |
tags: | added: foundations-engine |
Changed in juju: | |
assignee: | nobody → Eric Claude Jones (ecjones) |
Changed in juju: | |
milestone: | 2.3.0 → 2.3-rc1 |
Changed in juju: | |
assignee: | Eric Claude Jones (ecjones) → nobody |
Changed in juju: | |
assignee: | nobody → Eric Claude Jones (ecjones) |
summary: |
juju tries to acquire machines in specific zones even when no zone - constraint is specified + placement directive is specified |
Changed in juju: | |
milestone: | 2.3-rc1 → 2.3.1 |
Changed in juju: | |
status: | Triaged → In Progress |
status: | In Progress → Triaged |
Changed in juju: | |
milestone: | 2.3.1 → none |
Changed in juju: | |
milestone: | none → 2.3.2 |
Changed in juju: | |
milestone: | 2.3.2 → none |
assignee: | Eric Claude Jones (ecjones) → nobody |
This happens because Juju has a default policy of spreading instances across Availability Zones because that's what the "Availability" portion is supposed to mean.
As an example, deploying 3 units of an application on AWS tries to put the first unit on zone a, then the second unit on zone b, and the third in zone c, so that if any zone fails, then you still have availability for your application.
If we defaulted to just never supplying zones, then all instances would end up in the same zone, which means that a hardware failure would kill all instances.
It seems that MAAS is mixing AZ to mean "failure domain" with "collecting a group of machines together". Such that while there may be domains that we should spread units across, there are other domains that should be considered 'off limits' for provisioning.
I don't think we can just "not pass a zone if the user didn't specify it", because the default path behavior for all providers that I've seen is to use a single zone for instances if you don't ask for something else.
We may need a way to flag a zone as 'off limits' for automatic provisioning, which would need a fair bit of work to track what zones and what their usability is.
Another possibility would be to add a constraint to applications. Where users could specify something like:
juju deploy application --constraints "zones=a,b,c"
And then rather than listing all Availability Zones, and round-robinning across them, we would only round-robin across the explicit set that was passed.
And if it was really useful, we could potentially support negated zones, so something like:
juju deploy application --constraints "zones=^d,^e"
(If there are no positive zones listed, then the set of valid zones is all zones minus the negated ones.)
It feels like MAAS needs something other than "Availability Zones" as a mechanism for grouping machines, though. Because AZ is something that you *should* spread across, while a collection of machines used for a specific purpose is something that you *shouldn't* use for anything but that purpose. It might be a *Zone* but it isn't an *Availability* Zone.