multiple problems with undo for 'snap remove'

Bug #1899614 reported by Paweł Stołowski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
In Progress
High
Paweł Stołowski

Bug Description

Undo for 'snap remove' is not fully implemented and leads to inconsistent/broken snap that cannot be removed nor refreshed. This becomes and issue if snap fails to remove, e.g. due to a problem with removal of its data. Easy to reproduce with lxd snap:

snap install lxd
snap stop lxd
snap start lxd
snap refresh --edge lxd
snap remove lxd --purge

The last step fails:

error: cannot perform the following tasks:
- Stop snap "lxd" services ([is-enabled snap.lxd.activate.service] failed with exit status 1: Failed to get unit file state for snap.lxd.activate.service: No such file or directory
)
- Remove security profile for snap "lxd" (17605) (cannot find installed snap "lxd" at revision 17605: missing file /snap/lxd/17605/meta/snap.yaml)
- Remove data for snap "lxd" (17597) (unlinkat /var/snap/lxd/common/ns/mntns: device or resource busy)
- Disconnect lxd:lxd-support from core:lxd-support (snap "lxd" has no "lxd-support" plug)
... (remaining plugs listed)

The failing change:

Status Spawn Ready Summary
Error today at 11:18 UTC today at 11:18 UTC Stop snap "lxd" services
Undone today at 11:18 UTC today at 11:18 UTC Run remove hook of "lxd" snap if present
Done today at 11:18 UTC today at 11:18 UTC Disconnect interfaces of snap "lxd"
Undone today at 11:18 UTC today at 11:18 UTC Remove aliases for snap "lxd"
Done today at 11:18 UTC today at 11:18 UTC Make snap "lxd" unavailable to the system
Error today at 11:18 UTC today at 11:18 UTC Remove security profile for snap "lxd" (17605)
Done today at 11:18 UTC today at 11:18 UTC Remove data for snap "lxd" (17605)
Done today at 11:18 UTC today at 11:18 UTC Remove snap "lxd" (17605) from the system
Error today at 11:18 UTC today at 11:18 UTC Remove data for snap "lxd" (17597)
Hold today at 11:18 UTC today at 11:18 UTC Remove snap "lxd" (17597) from the system
Error today at 11:18 UTC today at 11:18 UTC Disconnect lxd:lxd-support from core:lxd-support
Error today at 11:18 UTC today at 11:18 UTC Disconnect lxd:system-observe from core:system-observe
Error today at 11:18 UTC today at 11:18 UTC Disconnect lxd:network-bind from core:network-bind
Error today at 11:18 UTC today at 11:18 UTC Disconnect lxd:network from core:network

......................................................................
Stop snap "lxd" services

2020-10-06T11:18:51Z INFO While trying to stop previously started service "snap.lxd.activate.service": [stop snap.lxd.activate.service] failed with exit status 5: Failed to stop snap.lxd.activate.service: Unit snap.lxd.activate.service not loaded.

2020-10-06T11:18:51Z ERROR [is-enabled snap.lxd.activate.service] failed with exit status 1: Failed to get unit file state for snap.lxd.activate.service: No such file or directory

......................................................................
Remove security profile for snap "lxd" (17605)

2020-10-06T11:18:50Z ERROR cannot find installed snap "lxd" at revision 17605: missing file /snap/lxd/17605/meta/snap.yaml

......................................................................
Remove data for snap "lxd" (17597)

2020-10-06T11:18:50Z ERROR unlinkat /var/snap/lxd/common/ns/mntns: device or resource busy

......................................................................
Disconnect lxd:lxd-support from core:lxd-support

2020-10-06T11:18:50Z ERROR snap "lxd" has no "lxd-support" plug

......................................................................
Disconnect lxd:system-observe from core:system-observe

2020-10-06T11:18:50Z ERROR snap "lxd" has no "system-observe" plug

......................................................................
Disconnect lxd:network-bind from core:network-bind

2020-10-06T11:18:50Z ERROR snap "lxd" has no "network-bind" plug

......................................................................
Disconnect lxd:network from core:network

2020-10-06T11:18:50Z ERROR snap "lxd" has no "network" plug

I've identified the following fundamental problems with undo for remove:

1. The clear-snap data task (Remove data for snap "lxd"...) can fail if it cannot remove a file that belongs to the snap, in this case it fails on /var/snap/lxd/common/ns/mntns, leading to undo and to all the other problems.

2. The unlink-snap task (Make snap "lxd" unavailable to the system) doesn't have undo handler, so "current" symlink is not restored even if it could (i.e. if the snap itself wasn't removed).

3. When we remove all the revisions on snap remove, we don't pay attention to the order and afaict current revision appears first. In the above example, 17605 was the current revision and it got successfully and completely removed; we failed later on removing snap data of an inactive old revision 17597 (before removing the snap itself). This means that this revision becomes "current" in a sense, but task snap-setup doesn't reflect it, and existing undo handlers (such as undo for setup-profiles) don't expect it as we roll everything back; on the task we remember the old (now completely gone) revision 17605.

In general, undoing remove is tricky and not always possible, but we should strive to keep things consistent and not leave a snap in a state, which it is "broken" and nothing can be done with it, even if its snap data was already removed.

I think the following could be done to rectify:
- make clear-snap data robust and ignore errors when removing snap data.
- implement undo for unlink-snap, so if we fail to remove some revisions, we restore "current" symlink properly. Perhaps set 'broken' flag on the snap we removed snap data already.
- reorder tasks for removing all revisions so that current revision is last. This should fix the 3rd problem.

Changed in snapd:
assignee: nobody → Paweł Stołowski (stolowski)
importance: Undecided → High
Changed in snapd:
status: New → In Progress
Revision history for this message
Paweł Stołowski (stolowski) wrote :
Revision history for this message
Paweł Stołowski (stolowski) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.