Charm should indicate whether replication is paused on slaves

Bug #1967033 reported by Paul Goins
This bug affects 1 person
Affects Status Importance Assigned to Milestone
PostgreSQL Charm

Bug Description

We hit a situation where a standby Postgres server accumulated roughly 60 GiB of WAL files in the pg_wal directory.

This appears to have been caused by replication having been paused via the replication-pause action. Replication was never resumed afterward, and thus the files started to build up, not being able to be applied to the running database - or at least, that's my theory. I concretely verified that pg_is_wal_replay_paused() returned true, and that the replication-pause action was run via the Juju audit logs.

We spent hours trying to figure out why this was happening, and not being Postgres experts, also considered running WAL archive trimming tools on the /var/lib/postgres/10/main/pg_wal directory (which would have likely been very bad!).

The charm status indicates very helpfully whether units are masters or standbys - it would be very helpful if the charm also indicated in some way whether replication on standbys was paused, as if left in this state too long, an out-of-disk situation like the one threatened in our case can occur.

TL;DR: Please provide a way to indicate via "juju status" if replication is paused on standby servers.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.