slave database should never be used when lag is too great

Bug #307407 reported by Stuart Bishop
4
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Stuart Bishop

Bug Description

When replication lag gets large, people start noticing the oddities, such as the branchscanner appearing to be taking forever or the bug they just reported using the email interface not existing in the web UI.

Launchpad can already detect how lagged the slave database is. If the lag is > X seconds, then the slave database should never be used even for Anonymous connections or connections that have not made changes recently.

This should also automatically lessen the load on the slave, freeing up resources so replication catches up sooner.

Related branches

Stuart Bishop (stub)
Changed in launchpad-foundations:
assignee: nobody → stub
importance: Undecided → High
status: New → Triaged
Revision history for this message
Francis J. Lacoste (flacoste) wrote :

Can we fix this before the release?

Revision history for this message
Stuart Bishop (stub) wrote :

I think it is too late for this cycle, and doesn't warrant a cherry pick as we have to try pretty hard to trigger noticeable replication lag and can work around it by configuring the appservers to not use the slave db if we do need to do something that triggers this situation.

Revision history for this message
Tom Haddon (mthaddon) wrote :

It would be ideal if this was something that was configurable without having to restart an application server.

Would it be possible to have a table in the master DB that controlled which DB each app server (probably identified by whichever LPCONFIG variable it's running as) would use for it's slave DB connection? At some interval (perhaps every 5 minutes or more often), each app server would check it's entry in this table, and use the slave DB as assigned there.

In this setup, if replication_lag is found to be longer than desired, we would just update the table that controlled which app servers connected to which slave DBs, and everything would revert to connecting to the master DB for all connections.

This would also make things easier if in the future we have multiple slave DBs and want to distribute load amongst them, and/or reconfigure that load based on maintenance work or other issues.

Revision history for this message
Stuart Bishop (stub) wrote :

For this issue, there is no need for configuration apart from specifying what the threshold is - the appservers will switch over to master-only mode when this threshold is reached.

I think it would be useful to easily change what database is used for the master and slave. However, I think this is better done using a connection pool like pgbouncer or pgpool rather than attempt to code this logic ourselves. This approach will work for batch processes as well as the web app.

Stuart Bishop (stub)
Changed in launchpad-foundations:
milestone: none → 2.2.1
Stuart Bishop (stub)
Changed in launchpad-foundations:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.