When removing an Octavia unit, the corresponding health manager Neutron port is not removed.
Implication:
- Amphora heartbeats may be sent to a non-existent address (corresponding to a no-longer-existent Octavia unit), resulting in missed heartbeats and amphorae being failed over unnecessarily.
Steps to reproduce:
- Deploy an OpenStack environment with Octavia (or in HA, 3 units with the hacluster charm)
- If running an HA environment, add a unit
- Delete an Octavia unit
- Check `openstack port list`, you'll see the "octavia-health-manager-octavia-N-listen-port" port still present
- Check `juju run --unit octavia/leader leader-get controller-ip-port-list`, and you'll still see the old IPs. This is set by querying Neutron, and it is used to configure "[health_manager]/controller-ip-port-list" in /etc/octavia/octavia.conf.
Workaround:
- Remove the unit
- Delete the port *after* removing the unit. Otherwise, Juju may recreate the port.
I don't see any code that performs this removal (unless I'm mistaken) so if this is intentional behaviour, I guess this is a feature request.
the side effect of this behavior is really serious: a newly created amphora will get the list of IPs of the controller units; since some of the IPs are "ghost IPs", the heartbeat messages never arrive -> the control plane decides to kill the (innocent) amphora causing a continuous failover