Comment 3 for bug 1907782

Revision history for this message
Jan (janhn) wrote :

After some time of using a zfs on root system, there are several issues which produce the error message described here and also kill performance of file browsers (eg. Save file dialog from a browser).

The service zsys-gc.service is permanently in a failed state:

$ systemctl list-units --failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
  zsys-gc.service loaded failed failed Clean up old snapshots to free space

Running apt commands produces the error:

ERROR Service took too long to respond. Disconnecting client.

Eventually the bpool ran out of space for new kernels and system upgrade started reporting dpkg errors related to installation of newer linux-image packages.

---

This may be multiple issues caused by having a lot of datasets (eg. ~1000) on rpool because of intensive use of docker and/or a lot of snapshots (eg. ~1300, ~400 of which are autozsys).

Running garbage collection manually does remove some of it:
$ sudo zsysctl -vvv service gc

Yet restarting the zsys-gc.service still fails:
$ sudo journalctl -f -u zsys-gc.service
systemd[1]: Starting Clean up old snapshots to free space...
zsysctl[1327202]: level=error msg="Service took too long to respond. Disconnecting client."
systemd[1]: zsys-gc.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: zsys-gc.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Clean up old snapshots to free space.

Is there a way and would it be meaningful to increase the timeout?
Are there any other ways to tune it to work, like reducing the number of maintained snapshots that garbage collection is aiming at keeping?
How to improve the performance?
What's the right way to clean up bpool of old images?

Thanks for any hints