"systemctl stop postgresql" fails to stop postgresql

Bug #1862138 reported by Mikko Rantalainen
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
postgresql-common (Debian)
Confirmed
Unknown
postgresql-common (Ubuntu)
Triaged
Low
Unassigned

Bug Description

Steps to reproduce (drop the "pgbouncer" for below commands if you only test barebones postgresql):

systemctl start postgresql pgbouncer && \
systemctl stop pgbouncer postgresql && \
ps auxw | grep ^postgre

No output is expected. However, in reality you get up to 6 running processes because /lib/systemd/system/postgresql.service is incorrectly implemented and does not follow the documentation:

man -P cat systemctl | grep -A4 no-block
       --no-block
           Do not synchronously wait for the requested operation to finish. If
           this is not specified, the job will be verified, enqueued and
           systemctl will wait until the unit's start-up is completed. By
           passing this argument, it is only verified and enqueued.

As such, because --no-block was not used, the "systemctl stop" should wait until the operation
has finished.

The command "systemctl stop postgresql" should be running something along the lines

systemctl list-units 'postgresql*' | grep ^postgresql@ | awk '{print $1}' | xargs -r systemctl stop

before returning unless --no-block is given.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: postgresql-common 173ubuntu0.2
ProcVersionSignature: Ubuntu 4.15.0-47.50~16.04.1-generic 4.15.18
Uname: Linux 4.15.0-47-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
Date: Wed Feb 5 16:46:04 2020
InstallationDate: Installed on 2017-04-25 (1016 days ago)
InstallationMedia: Ubuntu-Server 16.04.2 LTS "Xenial Xerus" - Release amd64 (20170215.8)
PackageArchitecture: all
SourcePackage: postgresql-common
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Mikko Rantalainen (mira) wrote :
Revision history for this message
Mikko Rantalainen (mira) wrote :

maybe somewhat related to bug 870379 ?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

To be clear - the services are stopped eventually, the issue is about `systemctl stop postgresql` coming back while still some processes are up.

I can confirm that behavior and agree that it might break e.g. automation that wants to do stop/change/start for changes known to break with the service running.

Changed in postgresql-common (Ubuntu):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

bug 870379 to me seems not related, that was never stopping the processes - while here we only have to wait a bit.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Start by default seems synchronous, but stop isn't:
root@f:~# systemctl start postgresql; echo STARTED; ps auxw | grep ^postgre; systemctl stop postgresql; echo STOPPED; ps auxw | grep ^postgre; sleep 5s; echo LATER; ps auxw | grep ^postgre;STARTED
postgres 31731 0.0 0.0 216444 24764 ? Ss 09:23 0:00 /usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main -c config_file=/etc/postgresql/12/main/postgresql.conf
postgres 31733 0.0 0.0 216444 4024 ? Ss 09:23 0:00 postgres: 12/main: checkpointer
postgres 31734 0.0 0.0 216444 5336 ? Ss 09:23 0:00 postgres: 12/main: background writer
postgres 31735 0.0 0.0 216444 5372 ? Ss 09:23 0:00 postgres: 12/main: walwriter
postgres 31736 0.0 0.0 216848 7440 ? Ss 09:23 0:00 postgres: 12/main: autovacuum launcher
postgres 31737 0.0 0.0 70824 4432 ? Ss 09:23 0:00 postgres: 12/main: stats collector
postgres 31738 0.0 0.0 216832 5988 ? Ss 09:23 0:00 postgres: 12/main: logical replication launcher
STOPPED
postgres 31731 0.0 0.0 216444 24764 ? Ss 09:23 0:00 /usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main -c config_file=/etc/postgresql/12/main/postgresql.conf
postgres 31733 0.0 0.0 216444 4024 ? Ss 09:23 0:00 postgres: 12/main: checkpointer
postgres 31734 0.0 0.0 216444 5336 ? Ss 09:23 0:00 postgres: 12/main: background writer
postgres 31735 0.0 0.0 216444 5372 ? Ss 09:23 0:00 postgres: 12/main: walwriter
postgres 31736 0.0 0.0 216848 7440 ? Ss 09:23 0:00 postgres: 12/main: autovacuum launcher
postgres 31737 0.0 0.0 70824 4432 ? Ss 09:23 0:00 postgres: 12/main: stats collector
postgres 31738 0.0 0.0 216832 5988 ? Ss 09:23 0:00 postgres: 12/main: logical replication launcher
LATER

Some options are documented to wait by default, but that is only the hue "mode entering" commands like default, rescue, emergency. There it says "This operation is blocking by default, use --no-block to request asynchronous behavior".
I wondered if for everything else the default might be non-block and can be switched with `--wait`
But I got: --wait may only be used with the 'start' or 'restart' commands.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

At the time stop finish it already counts as inactive which seems wrong.
While the processes still are up I got:
$ systemctl is-active postgresql
inactive
$ systemctl status postgresql
● postgresql.service - PostgreSQL RDBMS
     Loaded: loaded (/lib/systemd/system/postgresql.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Fri 2020-02-07 09:26:45 UTC; 28ms ago
    Process: 31910 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
   Main PID: 31910 (code=exited, status=0/SUCCESS)

Feb 07 09:26:45 f systemd[1]: Starting PostgreSQL RDBMS...
Feb 07 09:26:45 f systemd[1]: Started PostgreSQL RDBMS.
Feb 07 09:26:45 f systemd[1]: postgresql.service: Succeeded.
Feb 07 09:26:45 f systemd[1]: Stopped PostgreSQL RDBMS

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So far to odd, but we have to remember that:

root@f:~# systemctl cat postgresql
# /lib/systemd/system/postgresql.service
# systemd service for managing all PostgreSQL clusters on the system. This
# service is actually a systemd target, but we are using a service since
# targets cannot be reloaded.
...

Hence the "actual" Database(s) will have other services.
Usually created by the generator in /lib/systemd/system-generators/postgresql-generator
So on a default installation there will also be the actual service which "owns" the processes.

root@f:~# systemctl status postgresql@12-main.service
● postgresql@12-main.service - PostgreSQL Cluster 12-main
     Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
     Active: active (running) since Fri 2020-02-07 09:37:47 UTC; 1min 24s ago
    Process: 31971 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 12-main start (code=exited, status=0/SUCCESS)
   Main PID: 31976 (postgres)
      Tasks: 7 (limit: 4915)
     Memory: 18.7M
     CGroup: /system.slice/system-postgresql.slice/postgresql@12-main.service
             ├─31976 /usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main -c config_file=/etc/postgresql/12/main/postgresql.conf
             ├─31978 postgres: 12/main: checkpointer
             ├─31979 postgres: 12/main: background writer
             ├─31980 postgres: 12/main: walwriter
             ├─31981 postgres: 12/main: autovacuum launcher
             ├─31982 postgres: 12/main: stats collector
             └─31983 postgres: 12/main: logical replication launcher

Feb 07 09:37:44 f systemd[1]: Starting PostgreSQL Cluster 12-main...
Feb 07 09:37:47 f systemd[1]: Started PostgreSQL Cluster 12-main.

Shutting down that should be synchronous IMHO.
And indeed:
root@f:~# systemctl start postgresql; echo STARTED; ps auxw | grep ^postgre; systemctl stop postgresql@12-main.service; echo STOPPED; ps auxw | grep ^postgre;STARTED
postgres 31976 0.0 0.0 216444 24980 ? Ss 09:37 0:00 /usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main -c config_file=/etc/postgresql/12/main/postgresql.conf
postgres 31978 0.0 0.0 216444 4096 ? Ss 09:37 0:00 postgres: 12/main: checkpointer
postgres 31979 0.0 0.0 216444 5556 ? Ss 09:37 0:00 postgres: 12/main: background writer
postgres 31980 0.0 0.0 216444 9780 ? Ss 09:37 0:00 postgres: 12/main: walwriter
postgres 31981 0.0 0.0 216848 7572 ? Ss 09:37 0:00 postgres: 12/main: autovacuum launcher
postgres 31982 0.0 0.0 70824 4616 ? Ss 09:37 0:00 postgres: 12/main: stats collector
postgres 31983 0.0 0.0 216832 6088 ? Ss 09:37 0:00 postgres: 12/main: logical replication launcher
STOPPED

So individual real postgresql service instances actually do stop synchronously as expected.
The problem comes down to the Fake-Target in /lib/systemd/system/postgresql.service to trigger, but not wait for these to fully shut down.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Currently ordering is only done by the per Cluster services listing
  Before=postgresql.service
Further the generator adds symlinks in /run/systemd/generator/postgresql.service.wants/.

That makes it work on startup in terms of ordering.

There is: /usr/share/doc/postgresql-common/README.systemd
It explains that furthermore the parent service will only starts/stops/restarts/reload individual services that are configured as "auto" in
/etc/postgresql/*/*/start.conf.

That means one can't rely on postgresql.service to "shut down everything" anyway.

I guess the answer is that:
- postgresql.service is meant as a overarching helper and not meant to sync-shutdown
- if you need synchronous behavior pleas issue a stop to the individual cluster like
  $ systemctl stop postgresql@12-main.service
- that is actually beneficial, most tasks do this for backup or similar.
  This way you can stop/process/start the Clusters individually (if you have many)
- if you need a "one command stops all" you can use
  $ systemctl stop postgresql "postgresql@*"

I'm linking Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=759725 where this is being discussed for quite some time.
Due to the existing workarounds and the low amount of affected uses (the first time after year) this is prio medium/low - a classic "yes it would be nice but I'm not sure one will get to work on it" case.

To be clear if anyone has a great suggestion how to achieve the shutdown to be synchronous as well I guess we will rather quickly add it to Debian and Ubuntu - but right now we are missing that idea.

Changed in postgresql-common (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → Low
Changed in postgresql-common (Debian):
status: Unknown → Confirmed
Revision history for this message
Mikko Rantalainen (mira) wrote :

Maybe file `/lib/systemd/system/postgresql.service` should just include following:

ExecStop=systemctl stop "postgresql@*"

?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.