Make PostgreSQL replicable on Open Library
Bug #600018 reported by
George
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Open Library |
New
|
Critical
|
Jim Shankland |
Bug Description
Our database is a single point of failure today. We need to address this promptly.
Jim, please do the first round of analysis of our options here, and come up with a recommended course of action. Ideally, within a week.
Changed in openlibrary: | |
importance: | Undecided → Critical |
milestone: | none → stability |
Changed in openlibrary: | |
assignee: | nobody → Jim Shankland (jim-archive) |
To post a comment you must log in.
We are now running the openlibrary d/b on a pair of SSDs, which has massively improved its performance and allowed us to take nightly backups. The current state is that in the case of a hard failure, we will bring the replacement machine up after restoring the database from the nightly backup -- which will require several hours and lose some or all database updates since the last backup.
Work is in progress to bring up a "warm standby" server, which will reduce the maximum data loss in case of a hard failure to 5 minutes' worth, and should reduce downtime to under an hour. Development work on the warm standby server will be completed this week. The standby server will also require its own pair of SSDs before it can be put into full production. I've ordered these, but I'm not sure when they'll get delivered.