Postgres cannot startup after crashing
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
postgresql-common (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Ubuntu 15.10
Postgresql 9.5+175.pgdg15.10+1
postgresql-common 175.pgdg15.10+1
# How to reproduce
Execute 'echo b > /proc/sysrq-
After machine restart, systemd try to start cluster through pg_ctlcluster and failed
Log messages:
2016-10-18 15:22:50 MSK [5513-1] LOG: database system was interrupted; last known up at: 2016-10-18 15:08:50 MSK
2016-10-18 15:22:50 MSK [5513-2] LOG: database system was not properly shut down; automatic recovery in progress2016-10-18 15:22:50 MSK [5513-3] LOG: redo starts at A/ED186BA0
2016-10-18 15:22:50 MSK [5530-1] [н/д]@[н/д] LOG: incomplete startup packet
2016-10-18 15:22:51 MSK [5547-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:51 MSK [5550-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:52 MSK [5553-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:52 MSK [5556-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:53 MSK [5559-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:53 MSK [5562-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:54 MSK [5565-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:54 MSK [5570-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:55 MSK [5573-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:55 MSK [5576-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:56 MSK [5579-1] postgres@postgres FATAL: the database system is starting up
2016-10-18 15:22:56 MSK [5508-1] LOG: received smart shutdown request
2016-10-18 15:22:56 MSK [5580-1] LOG: shutting down
2016-10-18 15:22:56 MSK [5580-2] LOG: database system is shut down
# Why it is happens
pg_ctlcluster check cluster is running through psql
pg_ctlcluster contain func with name cluster_port_ready check:
while ($n < ($result ? 10 : 3)) {
select undef, undef, undef, 0.5;
$out = `$psql -h '$sd' --port $p -l 2>&1 > /dev/null`;
print STDERR "PSQL res: $out $?\n";
if ($? == $result) {
$n++;
} else {
$n = 0;
}
$result = $?;
}
That func check error code after executing psql. Max 10 times with interval 0.5s, so 5s is maximum time to postmaster restoring after crashing. After that pg_ctlcluster return exit code 1 and systemd send SIGTERM to postgres.
But postmaster cannot accept any connection during restore procedure
postmaser.c:2164
# How to fix
Increase timeout ?
Check message during connect: FATAL: the database system is starting up ?
Determine state of recovery and wait when done ?