Loadbalancers in PENDING_UPDATE state become immutable
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
octavia |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
We have observed that when an LB loses all of its members, it drops into a PENDING_UPDATE state and becomes immutable. The issue here is that the LB remains immutable until it returns to an ACTIVE state, which is usually caused by one or more members coming back online. However, there can be cases where LB members are completely removed (rather than simply being offline), and Octavia falsely labels them as OFFLINE. This state will last indefinitely. To recover from this state, operators have to perform database surgery to force Octavia to delete the LB in question.
A more desirable behavior would be to keep the LB mutable, but perhaps warn the user about modifying an LB that's in PENDING_UPDATE. This would allow operators and users to rapidly correct any LB that gets into this bad state without accidentally modifying an LB that should remain untouched.
Members going offline should not cause a transition to PENDING_UPDATE. That is not normal.
Member status should only affect the operating status field and never the provisioning status.
Also note, you should never edit that statuses in the database. PENDING_* means one of your controller processes has ownership of the load balancer object and is actively working on it. Some actions have very long retry timeouts (retrying nova failures for example). After those timeouts expire, the object should go back to ACTIVE or ERROR depending on if we could resolve the failure.
Can you provide the configuration settings of this load balancer?
Listeners, protocols, health monitor settings, etc?
Also, can you provide the worker and health manager logs?