Amphorae stuck in ERROR
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
octavia |
New
|
Undecided
|
Unassigned |
Bug Description
When the connection to the amphora is lost for some time the Amphorae wil be marked as provisioning_state ERROR. Even if the connection is back up, the Amphorae will be stuck in that state.
That's surprising as the healthchecks are coming in and the external traffic was never disrupted. If I issue a `amphora configure --wait` I can even see the successful completion of the request in the worker log. Unfortunately the cli still returns an error and the state of the Amphora does not change (`The resource did not successfully reach ACTIVE status`) I also tried getting the amphora stats and those update just fine.
The only way to get the LB back is to initiate a failover, which is not desirable in some cases and has to done as admin.
> Hi, /bugs.launchpad .net/octavia/
>
> FYI the octavia project no longer uses storyboard and moved back to launchpad: https:/
>
> 1. if an amphora is in provisioning_state ERROR, it means that an attempt to update the load balancer failed. So even if the connection is back, the amphora is probably incorrectly configured. Only a failover would fix it
>
> 2. "amphora configure" is used to propagate changes from the Controller's octavia.conf file to the configuration file in the amphora. It doesn't update the load balancer's resources. I think that there's an bug with the --wait flag because the configure API doesn't update the provisioning_status of the amphora/LB, it cannot wait for a specific status (a --wait option in this CLI doesn't make sense).
>
> I don't see any other alternative than a failover here.
> <footer>Gregory Thiemonge on 2023-07-17 at 14:51:41</footer>