2010-03-29 09:55:27 |
Tom Haddon |
bug |
|
|
added bug |
2010-03-29 09:55:49 |
Tom Haddon |
removed subscriber Tom Haddon |
|
|
|
2010-03-31 13:57:15 |
Stuart Metcalfe |
summary |
Staging login.ubuntu.com service doesn't deal gracefully with staging DB updates |
Failover backend should recover automatically |
|
2010-03-31 13:59:13 |
Stuart Metcalfe |
description |
The DB(s) for staging get periodically updated (as much as once a day). Currently the staging login.ubuntu.com service on cherimoya detects this, fails over, and then never recovers without manual intervention. This requires removal of /tmp/{db.master.failed,db.readonly} which is owned by the www-data user. |
The failover backend currently requires a manual reset once it is triggered. This is causing a problem when we see temporary non-availability of the database (restarts, intermittent network outages, etc) so IS have asked us to implement an automatic recovery mechanism.
There should be "some kind of exponential backoff" to help avoid flapping which will eventually fail permanently and require manual recovery.
We should also be able to force a state where manual recovery is required so we can manually switch to read-only mode for maintenance.
All failover state changes should be logged for diagnosis (oops) and we should be notified (nagios) of state change.
The various failover and recovery conditions should be configurable including the ability to disable automatic recovery.
IS will assist us with setting up failure conditions on staging for testing.
Current behaviour is documented here: https://wiki.canonical.com/InformationInfrastructure/ISD/Docs/SSO/Failover . That page should be updated with the new behaviours before this bug is closed.
|
|
2010-04-22 10:21:48 |
Stuart Metcalfe |
canonical-identity-provider: milestone |
|
2.5.0 |
|
2010-05-03 11:23:39 |
Anthony Lenton |
canonical-identity-provider: status |
New |
Confirmed |
|
2010-05-03 11:23:43 |
Anthony Lenton |
canonical-identity-provider: importance |
Undecided |
Medium |
|
2010-05-04 16:56:14 |
Anthony Lenton |
canonical-identity-provider: milestone |
2.5.0 |
2.6.0 |
|
2010-05-28 15:24:31 |
Tom Haddon |
tags |
|
canonical-losa-isd |
|
2010-06-03 18:56:20 |
Stuart Metcalfe |
canonical-identity-provider: milestone |
2.6.0 |
2.7.0 |
|
2010-06-17 16:08:18 |
Anthony Lenton |
tags |
canonical-losa-isd |
2-sp canonical-losa-isd |
|
2010-07-08 14:29:07 |
Anthony Lenton |
canonical-identity-provider: assignee |
|
Anthony Lenton (elachuni) |
|
2010-07-08 14:29:12 |
Anthony Lenton |
canonical-identity-provider: status |
Confirmed |
In Progress |
|
2010-07-14 11:33:23 |
Anthony Lenton |
canonical-identity-provider: status |
In Progress |
Fix Committed |
|
2010-07-19 15:23:14 |
Dave Morley |
bug task added |
|
canonical-isd-qa |
|
2010-07-19 15:23:25 |
Dave Morley |
canonical-isd-qa: status |
New |
Confirmed |
|
2010-07-19 15:23:28 |
Dave Morley |
canonical-isd-qa: assignee |
|
Dave Morley (davmor2) |
|
2010-07-29 13:15:28 |
Dave Morley |
description |
The failover backend currently requires a manual reset once it is triggered. This is causing a problem when we see temporary non-availability of the database (restarts, intermittent network outages, etc) so IS have asked us to implement an automatic recovery mechanism.
There should be "some kind of exponential backoff" to help avoid flapping which will eventually fail permanently and require manual recovery.
We should also be able to force a state where manual recovery is required so we can manually switch to read-only mode for maintenance.
All failover state changes should be logged for diagnosis (oops) and we should be notified (nagios) of state change.
The various failover and recovery conditions should be configurable including the ability to disable automatic recovery.
IS will assist us with setting up failure conditions on staging for testing.
Current behaviour is documented here: https://wiki.canonical.com/InformationInfrastructure/ISD/Docs/SSO/Failover . That page should be updated with the new behaviours before this bug is closed.
|
The failover backend currently requires a manual reset once it is triggered. This is causing a problem when we see temporary non-availability of the database (restarts, intermittent network outages, etc) so IS have asked us to implement an automatic recovery mechanism.
There should be "some kind of exponential backoff" to help avoid flapping which will eventually fail permanently and require manual recovery.
We should also be able to force a state where manual recovery is required so we can manually switch to read-only mode for maintenance.
All failover state changes should be logged for diagnosis (oops) and we should be notified (nagios) of state change.
The various failover and recovery conditions should be configurable including the ability to disable automatic recovery.
IS will assist us with setting up failure conditions on staging for testing.
Current behaviour is documented here: https://wiki.canonical.com/InformationInfrastructure/ISD/Docs/SSO/Failover . That page should be updated with the new behaviours before this bug is closed.
Testcase ISD_161
|
|
2010-07-29 13:15:34 |
Dave Morley |
canonical-isd-qa: status |
Confirmed |
Fix Committed |
|
2010-08-03 14:04:08 |
Danny Tamez |
canonical-isd-qa: milestone |
|
canonical-identity-provider+2.7.0 |
|
2010-08-03 15:57:43 |
Ricardo Kirkner |
canonical-identity-provider: status |
Fix Committed |
Fix Released |
|