Failover backend should recover automatically
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical SSO provider |
Fix Released
|
Medium
|
Anthony Lenton |
Bug Description
The failover backend currently requires a manual reset once it is triggered. This is causing a problem when we see temporary non-availability of the database (restarts, intermittent network outages, etc) so IS have asked us to implement an automatic recovery mechanism.
There should be "some kind of exponential backoff" to help avoid flapping which will eventually fail permanently and require manual recovery.
We should also be able to force a state where manual recovery is required so we can manually switch to read-only mode for maintenance.
All failover state changes should be logged for diagnosis (oops) and we should be notified (nagios) of state change.
The various failover and recovery conditions should be configurable including the ability to disable automatic recovery.
IS will assist us with setting up failure conditions on staging for testing.
Current behaviour is documented here: https:/
Testcase ISD_161
summary: |
- Staging login.ubuntu.com service doesn't deal gracefully with staging DB - updates + Failover backend should recover automatically |
description: | updated |
Changed in canonical-identity-provider: | |
milestone: | none → 2.5.0 |
Changed in canonical-identity-provider: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
Changed in canonical-identity-provider: | |
milestone: | 2.5.0 → 2.6.0 |
tags: | added: canonical-losa-isd |
Changed in canonical-identity-provider: | |
milestone: | 2.6.0 → 2.7.0 |
tags: | added: 2-sp |
Changed in canonical-identity-provider: | |
assignee: | nobody → Anthony Lenton (elachuni) |
status: | Confirmed → In Progress |
Changed in canonical-identity-provider: | |
status: | In Progress → Fix Committed |
Changed in canonical-isd-qa: | |
milestone: | none → canonical-identity-provider+2.7.0 |
Changed in canonical-identity-provider: | |
status: | Fix Committed → Fix Released |
Passes on EC2
Anthony created some scripts to break the DB. Restoring it ended the readonly mode.