Launchpad itself

slave database should never be used when lag is too great

Bug #307407 reported by Stuart Bishop on 2008-12-12

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	High	Stuart Bishop	Launchpad itself 2.2.1

Bug Description

When replication lag gets large, people start noticing the oddities, such as the branchscanner appearing to be taking forever or the bug they just reported using the email interface not existing in the web UI.

Launchpad can already detect how lagged the slave database is. If the lag is > X seconds, then the slave database should never be used even for Anonymous connections or connections that have not made changes recently.

This should also automatically lessen the load on the slave, freeing up resources so replication catches up sooner.

Tags:

Related branches

lp://staging/~stub/launchpad/replication

Merged into lp://staging/launchpad

Abel Deuring (community): Approve (code) on 2012-09-21

Merged into lp://staging/launchpad/db-devel

Aaron Bentley (community): Approve on 2010-02-08

Stuart Bishop (stub) on 2008-12-12

Changed in launchpad-foundations:
assignee:	nobody → stub
importance:	Undecided → High
status:	New → Triaged

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2008-12-12:

Can we fix this before the release?

Revision history for this message

Stuart Bishop (stub) wrote on 2008-12-12:

I think it is too late for this cycle, and doesn't warrant a cherry pick as we have to try pretty hard to trigger noticeable replication lag and can work around it by configuring the appservers to not use the slave db if we do need to do something that triggers this situation.

Revision history for this message

Tom Haddon (mthaddon) wrote on 2008-12-13:

It would be ideal if this was something that was configurable without having to restart an application server.

Would it be possible to have a table in the master DB that controlled which DB each app server (probably identified by whichever LPCONFIG variable it's running as) would use for it's slave DB connection? At some interval (perhaps every 5 minutes or more often), each app server would check it's entry in this table, and use the slave DB as assigned there.

In this setup, if replication_lag is found to be longer than desired, we would just update the table that controlled which app servers connected to which slave DBs, and everything would revert to connecting to the master DB for all connections.

This would also make things easier if in the future we have multiple slave DBs and want to distribute load amongst them, and/or reconfigure that load based on maintenance work or other issues.

Revision history for this message

Stuart Bishop (stub) wrote on 2008-12-15:

For this issue, there is no need for configuration apart from specifying what the threshold is - the appservers will switch over to master-only mode when this threshold is reached.

I think it would be useful to easily change what database is used for the master and slave. However, I think this is better done using a connection pool like pgbouncer or pgpool rather than attempt to code this logic ourselves. This approach will work for batch processes as well as the web app.

Stuart Bishop (stub) on 2008-12-17

Changed in launchpad-foundations:
milestone:	none → 2.2.1

Stuart Bishop (stub) on 2008-12-19

Changed in launchpad-foundations:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.