octavia loadbalancer refuses to manually failover
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
octavia (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
On 2020-04-09, we discovered that the failover command creates a new LB instance, kills the old instance (which does not exist), and then fails to plug a network port as there is already an ovs port on the host for some reason.
This was discovered during a failed migration attempt using an amphora image, so an attempt was made to use the failover command. We run the Octavia loadbalancer with only a single amphora (standalone).
Some of the octavia units were missing the configuration for rabbitmq, most likely since October 2019, which was fixed when we discovered it.
SEG escalation found the following:
- The duplicate db entry of port bindings causes a `loadbalancer failover` to fail similarly to what is reported here
- Removing the additional ml2_port_bindings entry does not fix the issue
affects: | charm-octavia → octavia (Ubuntu) |
SEG comment (from 2020-04-21):
The OpenStack CLI can not be used to remedy the database state. Any
attempt to operate on the VRRP port fails because Neutron does not
expect two entries with identical port IDs.
Manual removal of the additional entry in the database does not
resolve the problem. During a failover, Octavia starts a new
amphora VM and tries to add the VRRP port to the new VM. This step
fails because the database is still in an inconsistent state. We
did not find a way to update the databases to allow for a regular
loadbalancer failover, but assume that potentially several tables
would have to be self-consistently changed.
While such further database updates are possible the risks of
creating an inconsistent database state with potential impact
beyond the loadbalancer are significant. It might be more
acceptable and would certainly be safer to instead delete the
failed loadbalancer and re-create it.