Failover failing with 3+ units, diverged timeline
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
PostgreSQL Charm |
Triaged
|
High
|
Unassigned |
Bug Description
Failover can fail when there are three or more units.
To failover, the leader should pause xlog replay on all units, see which has received the most WAL, and declare it the new master. The new master should promote itself to master, which switches timelines. Standby units should ensure replication settings are in place for the new master before restarting.
Something is failing, and while the new master is successfully promoted other remaining standbys may fail to replicate from the new master due to their replay point being more advanced than the timeline switch. Either the check to see which unit is most advanced is bogus, or standbys receiving or replaying more of the old timeline after the check. Note that the old master may still be active.
Changed in postgresql-charm: | |
status: | New → Triaged |
importance: | Undecided → High |
have the same.. below is the related unit logs, it is not firing the switchover, despite I shutdown the master machine.
root@server:~# juju debug-log --include postgres-ha/6 ha/6.juju- log Falling back to comma separated extra_pg_auth ha/6.juju- log Setting hot_standby to True ha/6.juju- log Setting wal_level to logical ha/6.juju- log Setting wal_keep_segments to 500 ha/6.juju- log PostgreSQL has been configured ha/6.juju- log postgresql.conf settings unchanged ha/6.juju- log Invoking reactive handler: reactive/ postgresql/ service. py:851: set_active ha/6.juju- log active: Live secondary (9.5.10) ha/6.juju- log Coordinator: Leader handling coordinator requests ha/6.juju- log Coordinator: Publishing state ha/6.juju- log Reactive main running for hook update-status ha/6.juju- log Coordinator: Using charms. coordinator. SimpleCoordinat or coordinator ha/6.juju- log Initializing Snap Layer ha/6.update- status none ha/6.juju- log Initializing Apt Layer ha/6.juju- log Coordinator: Loading state ha/6.juju- log Coordinator: Leader handling coordinator requests ha/6.juju- log Coordinator: Initializing coordinator layer ha/6.juju- log Initializing Leadership Layer (is leader) ha/6.juju- log preflight handler: reactive/ workloadstatus. py:57:initializ e_workloadstatu s_state ha/6.juju- log preflight handler: reactive/ postgresql/ preflight. py:25:block_ on_bad_ juju ha/6.juju- log preflight handler: reactive/ postgresql/ preflight. py:33:block_ on_invalid_ config ha/6.juju- log Invoking reactive handler: reactive/ postgresql/ service. py:45:main ha/6.juju- log Reactive state: leadership. is_leader ha/6.juju- log Reactive state: leadership. set.coordinator ha/6.juju- log Reactive state: leadership. set.master ha/6.juju- log Reactive state: leadership. set.replication _password ha/6.juju- log Reactive state: postgresql. client. passwords_ set
unit-postgres-ha-6: 20:50:49 WARNING unit.postgres-
unit-postgres-ha-6: 20:50:50 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:50 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:50 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:50 INFO unit.postgres-
unit-postgres-ha-6: 20:50:51 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:51 INFO unit.postgres-
unit-postgres-ha-6: 20:50:51 INFO unit.postgres-
unit-postgres-ha-6: 20:50:52 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:52 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:33 INFO unit.postgres-
unit-postgres-ha-6: 20:50:33 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:33 INFO unit.postgres-
unit-postgres-ha-6: 20:50:34 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:34 INFO unit.postgres-
unit-postgres-ha-6: 20:50:34 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:35 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:35 INFO unit.postgres-
unit-postgres-ha-6: 20:50:36 INFO unit.postgres-
unit-postgres-ha-6: 20:50:36 INFO unit.postgres-
unit-postgres-ha-6: 20:50:36 INFO unit.postgres-
unit-postgres-ha-6: 20:50:37 INFO unit.postgres-
unit-postgres-ha-6: 20:50:37 INFO unit.postgres-
unit-postgres-ha-6: 20:50:37 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:37 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:37 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:37 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:38 DEBUG unit.postgres-
unit-postgres-ha-6: 20:50:38 DEB...