juju deploy bundle model comparison doesn't take into account still-deploying units
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
I am deploying an openstack cloud with MAAS provider. I have deployed successfully 8 of 12 servers.
Two had failures. I cleaned up one that had several container placements on it, and then ran juju deploy ./bundle.yaml again. It properly calculated the missing machine and containers/
My model currently has this status in juju machines:
23 pending 10.216.5.88 mg7msp xenial rack-2 Deploying
23/lxd/0 pending pending xenial
23/lxd/1 pending pending xenial
23/lxd/2 pending pending xenial
23/lxd/3 pending pending xenial
23/lxd/4 pending pending xenial
23/lxd/5 pending pending xenial
and these services:
ceph-mon/3 waiting allocating 23/lxd/0 waiting for machine
ceph-osd/9 waiting allocating 23 10.216.5.88 waiting for machine
ceph-radosgw/3 waiting allocating 23/lxd/1 waiting for machine
cinder/3 waiting allocating 23/lxd/2 waiting for machine
designate/3 waiting allocating 23/lxd/3 waiting for machine
mysql/3 waiting allocating 23/lxd/4 waiting for machine
nova-compute-kvm/7 waiting allocating 23 10.216.5.88 waiting for machine
openstack-
I then went to delete the other failed machine which was a VM with a rabbitmq-server instance on it.
I then re-ran the juju deploy bundle.yaml while the machine 23 above was still pending maas deployment, thinking that it'd just rebuild the last rabbitmq-server VM that's not currently in model or pending in model.
The bug comes in that it allocated a new machine with the same services as machine 23 above, then resulting in lack of matching hardware to deploy on:
23 pending 10.216.5.88 mg7msp xenial rack-2 Deploying: 'curtin' configuring partition: nvme0n1-part3
23/lxd/0 pending pending xenial
23/lxd/1 pending pending xenial
23/lxd/2 pending pending xenial
23/lxd/3 pending pending xenial
23/lxd/4 pending pending xenial
23/lxd/5 pending pending xenial
24 pending pending xenial failed to start machine 24 in zone "default", retrying in 10s with new availability zone: failed to acquire node: No available machine matches constraints: [('agent_name', ['38815777-
I then went to delete that machine juju remove-machine --force 24, and it's still stuck in the model, though the containers on it did get removed from the model.
tags: | added: bundles |
affects: | juju-core → juju |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Medium |
machine 24 did finally clear out of the model.