juju add-unit performance degrades in large environments
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Expired
|
Medium
|
Unassigned | ||
juju-core |
Invalid
|
Undecided
|
Unassigned | ||
juju-core (Ubuntu) |
Triaged
|
Medium
|
Unassigned |
Bug Description
Adding units to a large, complex MAAS environment is extremely slow - for example:
juju add-unit -n 63 nova-compute-b8
takes several 10's of minutes to complete.
Environment has 381 existing service units spread across a number of services with subordinates as well (see status.json).
jujud on machine 0 is spinning at about 200% cpu with load average: 2.88, 2.86, 2.71
Some errors in machine-0.log:
2014-05-09 12:29:39 WARNING juju.provider.maas environ.go:233 picked arbitrary tools &{"1.18.
2014-05-09 12:31:36 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/16 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:31:36 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/16 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:32:03 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/17 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:32:03 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/17 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:32:13 WARNING juju.provider.maas environ.go:233 picked arbitrary tools &{"1.18.
2014-05-09 12:32:36 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/18 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:32:36 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/18 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:33:32 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/20 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:33:32 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/20 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:34:31 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/22 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:34:31 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/22 cannot get assigned machine: unit "nova-compute-
2014-05-09 12:34:47 WARNING juju.provider.maas environ.go:233 picked arbitrary tools &{"1.18.
2014-05-09 12:35:07 WARNING juju.provider.maas environ.go:233 picked arbitrary tools &{"1.18.
2014-05-09 12:35:21 WARNING juju.provider.maas environ.go:233 picked arbitrary tools &{"1.18.
2014-05-09 12:38:18 WARNING juju.provider.maas environ.go:233 picked arbitrary tools &{"1.18.
2014-05-09 12:38:24 ERROR juju.state.unit unit.go:523 unit nova-compute-b8/1 cannot get assigned machine: unit "nova-compute-b8/1" is not assigned to a machine
2014-05-09 12:38:45 WARNING juju.provider.maas environ.go:233 picked arbitrary tools &{"1.18.
tags: | added: sm15k |
Changed in juju-core (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in juju-core: | |
status: | Triaged → Invalid |
importance: | Medium → Undecided |
I thought this might be a dupe of: /bugs.launchpad .net/juju- core/+bug/ 1245649
https:/
But I think that doesn't come into play until you have many 1000s of units
of a single service (like 4k in my testing).
I believe the internals of add-unit do each one-by-one, and from the above
logs it looks like it doesn't reuse any of the information lookups. (It
appears that it has to uniquely query the provider for all the information
for each node, as well as do a full tools lookup.)
I thought tools lookup was actually done in the Provisioner side, which
should be asynchronous from add-unit doing its work. (I have seen that
add-unit doesn't return as early as I would expect it to.) I don't know if
it is just accidental interlocking (Provisioner is busy rewriting the same
docs that AddUnit is trying to write as it tries to actually bring up new
instances).
All of those errors in the log appear to be on the Agent/Provisioner side,
not directly in AddUnit.
Those specific errors are actually just the InstancePoller trying to update
IP addresses for various units, but there is no machine started for the
unit yet, so it just returns an error for its IP address. Arguably it isn't
actually an *error* yet. I'm guessing something is running "juju status"
while the machines are coming up, and it is causing us to log messages
about not actually having an instance for given machines yet.
I filed https:/ /bugs.launchpad .net/juju- core/+bug/ 1318148 that this
shouldn't actually be considered an ERROR (it is expected that there will
be a short period of time where a Unit doesn't have an IP address because
its associated machine hasn't actually been brought up yet.)
So there probably is still a performance bug that add-unit is doing a bit
too much work before returning, but it isn't related to the above error
messages (I believe).
John
=:->
On Fri, May 9, 2014 at 5:16 PM, James Page <email address hidden> wrote:
> Public bug reported: 2-trusty- amd64" " /streams. canonical. com/juju/ tools/releases/ juju-1. 18.2-trusty- amd64.tgz" 95f5add552c9023 a8ef83751c415da 77c6021b79321af 16c85" b8/16" b8/16" b8/17"
>
> Adding units to a large, complex MAAS environment is extremely slow -
> for example:
>
> juju add-unit -n 63 nova-compute-b8
>
> takes several 10's of minutes to complete.
>
> Environment has 381 existing service units spread across a number of
> services with subordinates as well (see status.json).
>
> jujud on machine 0 is spinning at about 200% cpu with load average:
> 2.88, 2.86, 2.71
>
> Some errors in machine-0.log:
>
> 2014-05-09 12:29:39 WARNING juju.provider.maas environ.go:233 picked
> arbitrary tools &{"1.18.
> https:/
> "1214b581d86b87
> %!q(int64=7382418)}
> 2014-05-09 12:31:36 ERROR juju.state.unit unit.go:523 unit
> nova-compute-b8/16 cannot get assigned machine: unit "nova-compute-
> is not assigned to a machine
> 2014-05-09 12:31:36 ERROR juju.state.unit unit.go:523 unit
> nova-compute-b8/16 cannot get assigned machine: unit "nova-compute-
> is not assigned to a machine
> 2014-05-09 12:32:03 ERROR juju.state.unit unit.go:523 unit
> nova-compute-b8/17 cannot get assigned machine: unit "nova-compute-
> is not assigned to a machine
> 2014-05-09 12:32:03 ...