Activity log for bug #550792

Date Who What changed Old value New value Message
2010-03-29 09:55:27 Tom Haddon bug added bug
2010-03-29 09:55:49 Tom Haddon removed subscriber Tom Haddon
2010-03-31 13:57:15 Stuart Metcalfe summary Staging login.ubuntu.com service doesn't deal gracefully with staging DB updates Failover backend should recover automatically
2010-03-31 13:59:13 Stuart Metcalfe description The DB(s) for staging get periodically updated (as much as once a day). Currently the staging login.ubuntu.com service on cherimoya detects this, fails over, and then never recovers without manual intervention. This requires removal of /tmp/{db.master.failed,db.readonly} which is owned by the www-data user. The failover backend currently requires a manual reset once it is triggered. This is causing a problem when we see temporary non-availability of the database (restarts, intermittent network outages, etc) so IS have asked us to implement an automatic recovery mechanism. There should be "some kind of exponential backoff" to help avoid flapping which will eventually fail permanently and require manual recovery. We should also be able to force a state where manual recovery is required so we can manually switch to read-only mode for maintenance. All failover state changes should be logged for diagnosis (oops) and we should be notified (nagios) of state change. The various failover and recovery conditions should be configurable including the ability to disable automatic recovery. IS will assist us with setting up failure conditions on staging for testing. Current behaviour is documented here: https://wiki.canonical.com/InformationInfrastructure/ISD/Docs/SSO/Failover . That page should be updated with the new behaviours before this bug is closed.
2010-04-22 10:21:48 Stuart Metcalfe canonical-identity-provider: milestone 2.5.0
2010-05-03 11:23:39 Anthony Lenton canonical-identity-provider: status New Confirmed
2010-05-03 11:23:43 Anthony Lenton canonical-identity-provider: importance Undecided Medium
2010-05-04 16:56:14 Anthony Lenton canonical-identity-provider: milestone 2.5.0 2.6.0
2010-05-28 15:24:31 Tom Haddon tags canonical-losa-isd
2010-06-03 18:56:20 Stuart Metcalfe canonical-identity-provider: milestone 2.6.0 2.7.0
2010-06-17 16:08:18 Anthony Lenton tags canonical-losa-isd 2-sp canonical-losa-isd
2010-07-08 14:29:07 Anthony Lenton canonical-identity-provider: assignee Anthony Lenton (elachuni)
2010-07-08 14:29:12 Anthony Lenton canonical-identity-provider: status Confirmed In Progress
2010-07-14 11:33:23 Anthony Lenton canonical-identity-provider: status In Progress Fix Committed
2010-07-19 15:23:14 Dave Morley bug task added canonical-isd-qa
2010-07-19 15:23:25 Dave Morley canonical-isd-qa: status New Confirmed
2010-07-19 15:23:28 Dave Morley canonical-isd-qa: assignee Dave Morley (davmor2)
2010-07-29 13:15:28 Dave Morley description The failover backend currently requires a manual reset once it is triggered. This is causing a problem when we see temporary non-availability of the database (restarts, intermittent network outages, etc) so IS have asked us to implement an automatic recovery mechanism. There should be "some kind of exponential backoff" to help avoid flapping which will eventually fail permanently and require manual recovery. We should also be able to force a state where manual recovery is required so we can manually switch to read-only mode for maintenance. All failover state changes should be logged for diagnosis (oops) and we should be notified (nagios) of state change. The various failover and recovery conditions should be configurable including the ability to disable automatic recovery. IS will assist us with setting up failure conditions on staging for testing. Current behaviour is documented here: https://wiki.canonical.com/InformationInfrastructure/ISD/Docs/SSO/Failover . That page should be updated with the new behaviours before this bug is closed. The failover backend currently requires a manual reset once it is triggered. This is causing a problem when we see temporary non-availability of the database (restarts, intermittent network outages, etc) so IS have asked us to implement an automatic recovery mechanism. There should be "some kind of exponential backoff" to help avoid flapping which will eventually fail permanently and require manual recovery. We should also be able to force a state where manual recovery is required so we can manually switch to read-only mode for maintenance. All failover state changes should be logged for diagnosis (oops) and we should be notified (nagios) of state change. The various failover and recovery conditions should be configurable including the ability to disable automatic recovery. IS will assist us with setting up failure conditions on staging for testing. Current behaviour is documented here: https://wiki.canonical.com/InformationInfrastructure/ISD/Docs/SSO/Failover . That page should be updated with the new behaviours before this bug is closed. Testcase ISD_161
2010-07-29 13:15:34 Dave Morley canonical-isd-qa: status Confirmed Fix Committed
2010-08-03 14:04:08 Danny Tamez canonical-isd-qa: milestone canonical-identity-provider+2.7.0
2010-08-03 15:57:43 Ricardo Kirkner canonical-identity-provider: status Fix Committed Fix Released