snap refresh while command is running may cause issues

Bug #1616650 reported by Jamie Strandboge
304
This bug affects 45 people
Affects Status Importance Assigned to Milestone
snapd
In Progress
Medium
Zygmunt Krynicki

Bug Description

In testing a desktop snap that saves state in $HOME on close, I noticed that if I snap refresh the snap while the command is running that it will try to save its state to the previous snap version's data directory. For the snap I was testing (a browser), this resulted in a very poor user experience (the browser on restart complained about an improper shutdown).

What is happening is that:
1. on launch the snap's HOME is set to SNAP_USER_DATA, which is something like /home/user/snap/foo/x1. The security policy correctly allows writes to SNAP_USER_DATA
2. on snap refresh to 'x2', the security policy for the snap is updated for the running process such that /home/user/snap/foo/x1 is readonly and /home/user/snap/foo/x2 is read/write
3. the command in '1's environment is not changed and HOME (as well as SNAP_USER_DATA and SNAP_DATA) are all still using 'x1' in the path
4. the command tries to shutdown gracefully and save state to the 'x1' HOME and security policy blocks it

Snappy's design for rollbacks relies on the previous SNAP_DATA and SNAP_USER_DATA directories not being writable and IMHO we should not change the policy to make other snap version's data dirs writable.

The design of the snappy state engine ensures (among other things) that there is only ever one security policy in place for the snap. In snappy 15.04 this problem was (intentionally) avoided because we used snap security policy that was versioned such that the new policy would not apply until the next app invocation.

Gustavo and Zygmunt, you both advocated strongly for only one version of the policy on disk and loaded in the kernel and I recall bringing up this type of bug as a counter-argument, and if IIRC for daemons we said that snapd could simply restart them (makes perfect sense). Have you thought of the mechanism for restarting non-daemons?

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Perhaps this was already in the design of the state engine? Eg, the state engine will periodically try to uninstall an app if it is unable to unmount the squashfs. I wonder if the security policy load (or even install) could be deferred until there were no processes running under that security label (easy to determine by examining /proc (see ps -Z).

description: updated
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I had this happen again. The snap refreshed while I was offline and I ended up with 890531 sandbox denials and syslog at 288M before I came back online and stopped the snap in question. Marking confirmed.

Changed in snappy:
status: New → Confirmed
Revision history for this message
Michael Vogt (mvo) wrote :

This should be fixed now. We do a lazy unmount of mounted snaps now, so anythign still running will get removed in a delayed fashion.

Changed in snappy:
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Snappy because there has been no activity for 60 days.]

Changed in snappy:
status: Incomplete → Expired
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

@mvo - you marked this as Incomplete stating "This should be fixed now. We do a lazy unmount of mounted snaps now, so anythign still running will get removed in a delayed fashion." but that is not what this bug is talking about. This bug is talking about the fact that a running application's environment points to versioned data directories that when started, the application has write access to, but after a refresh does not because the application is not restarted after the refresh.

See https://forum.snapcraft.io/t/bug-saves-are-blocked-to-snap-user-data-if-snap-updates-when-it-is-already-running/3226/1

What we should probably do during a refresh is look in the freezer cgroup to see is there are any non-daemon running processes (daemons are already handled due to systemd unit restarts). If so, delay the refresh (perhaps with pop up allowing the user to stop the application).

affects: snappy → snapd
Changed in snapd:
status: Expired → Confirmed
Michael Vogt (mvo)
Changed in snapd:
status: Confirmed → Triaged
importance: Undecided → Medium
Revision history for this message
Ernst Sjöstrand (ernstp) wrote :

My workstation usually have many weeks uptime, and I leave my editors open so I can just pick up where I left them.
My editors happens to be snaps like Atom, Intellij, Pycharm etc.
So I run into this bug very frequently.

Zygmunt Krynicki (zyga)
Changed in snapd:
assignee: nobody → Zygmunt Krynicki (zyga)
Revision history for this message
Olivier Tilloy (osomon) wrote :

This is especially visible with the chromium snap. As described by Jamie, if the chromium snap is refreshed while running, at the next restart the application will complain that it wasn't shut down properly. Also, the GNOME dock looses track of the running application when the refresh is done.

This is regularly being reported by users on forums.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm starting to work on this feature.

The general idea is that a package will only refresh when there are no processes running. I will document the feature separately on the forum and link it back here when ready. The feature will be based on the existing freezer cgroup that snapd manages for each snap.

Changed in snapd:
status: Triaged → In Progress
Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

We've re-designed the feature to be much simpler to understand.

<<<__DOC__

Simple approach first, covers the most common desktop case:

When user requests a refresh interactively we perform synchronous server side soft verification that the request can go forward, for details about the verification, see below. If verification fails we return a synchronous error response, otherwise we create the usual change and return the async response
In the refresh chain, before stopping services we perform another soft verification. If the verification fails but is below the grace period of postponed refreshes we fail the change (due to lanes only the affected snaps will fail to refresh). If the last refresh time is no longer in the grace period we remember to kill all processes and carry on.
After stopping services we perform hard verification, if that fails but we are still within the grace period we restart services we’ve stopped and fail the change, as above. If the grace period has elapsed we kill all processes belonging to the snap and proceed with the refresh as usual.

Soft verification - the set of processes belonging to non-service applications is non-empty
Hard verification - the set of processes belonging to a given snap is non-empty

We can compute those sets by examining our freezer cgroup process list (/sys/fs/cgroup/freezer/snap.$SNAP_NAME/cgroup.procs) and set of processes belonging to all the services that exist in the snap (by looking at /sys/fs/cgroup/systemd/system.slice/snap.$SNAP_NAME.*.service/cgroup.procs)

Once simple approach is implemented, we can consider several improvements.

- We can initiate the refresh process instantly after the last application process terminates using cgroup v1 or v2 notification mechanism.
- We can introduce new hooks that notify an application about a pending update
- We can introduce session-level hooks via snapd and snap-userd to deliver messages to the session of users that have logged in
- We can pre-download the snap and perform the update in a special boot mode, matching similar work on recent desktop and server systems.
- A mechanism that allows applications to grab refresh inhibit locks for critical operations for a bound amount of time (independent of the logic above)

__DOC__

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I've targeted this to 2.39 where it should be available behind a feature flag. It may be available earlier but 2.38 is likely to branch for release soon so I think that's unrealistic.

Changed in snapd:
milestone: none → 2.39
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This is progressing nicely. I think 2.39 will have it behind a feature flag with 80% certainty now.

Revision history for this message
Olivier Tilloy (osomon) wrote :

Zygmunt, what's the latest status on this bug?

I'm running snapd 2.39.3 and still seeing this issue when the chromium snap is refreshed under my feet.

Revision history for this message
Olivier Tilloy (osomon) wrote :

Ah, nevermind, I just found https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736.
Going to test the feature flag right away.

Revision history for this message
Efthimios Chaskaris (echaskaris) wrote :

I experienced this issue with chrome again, after entering the command to enable the flag in console, from the snapcraft forum. I checked and the flag is "true".

Revision history for this message
Simon Déziel (sdeziel) wrote :

This happened to me (again!) with Chromium that updated/vanished on me while I was working on a important document. I'm running snapd 2.40+18.04.

Revision history for this message
Simon Déziel (sdeziel) wrote :

@echaskaris, fortunately with Chromium (maybe Chrome too?) there is a way to recover your session that vanished after the unexpected refresh. You need to revert to the previous snap (sudo snap revert chromium), start a Chromium, go to the "hamburger" menu, "More tools" then "Task manager". There, you can double click on one of your tabs to bring back the old window/session.

Revision history for this message
Efthimios Chaskaris (echaskaris) wrote :

I'm not losing my session, just some aspects get weird (losing bookmark thumbnails, password functionality, it's like the session never happened)

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This is still in progress and is not released yet. I'll focus my next two weeks on advancing a solution to this issue immensely.

Changed in snapd:
milestone: 2.39 → none
Revision history for this message
Haw Loeung (hloeung) wrote :

With experimental.refresh-app-awareness enabled, it still updated:

| $ sudo snap get core experimental
| Key Value
| experimental.refresh-app-awareness true

| $ snap changes
| ID Status Spawn Ready Summary
| 401 Done yesterday at 07:40 AEDT yesterday at 07:40 AEDT Auto-refresh snap "openstackclients"
| 402 Done yesterday at 16:55 AEDT yesterday at 16:55 AEDT Auto-refresh snap "openstackclients"
| 403 Done yesterday at 18:35 AEDT yesterday at 18:35 AEDT Auto-refresh snap "openstackclients"
| 404 Done today at 13:27 AEDT today at 13:28 AEDT Auto-refresh snaps "chromium", "openstackclients"

| $ snap info chromium
| ...
| channels:
| latest/stable: 80.0.3987.122 2020-02-26 (1040) 160MB -
| installed: 80.0.3987.122 (1040) 160MB -

Revision history for this message
Olivier Tilloy (osomon) wrote :

Haw, the current implementations forces a refresh after 7 days (see https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736/23), could it be that you had had chromium running for longer than that?

Revision history for this message
Alexis Wilke (alexis-m2osw) wrote :

Last time I rebooted, I noticed that Chromium again did not save the latest set of tabs. In my case, I can say that it had been running for a very long time (way more than 7 days, probably around 20 to 30 days). That being said, the little dot under the icon in Gnome did not go away. So there is that.

Revision history for this message
Haw Loeung (hloeung) wrote : Re: [Bug 1616650] Re: snap refresh while command is running may cause issues

On Thu, Mar 05, 2020 at 08:06:11AM -0000, Olivier Tilloy wrote:
> Haw, the current implementations forces a refresh after 7 days (see
> https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736/23), could
> it be that you had had chromium running for longer than that?
>

Ah yes, that would be it. Didn't realise that even with
refresh-app-awareness, there's a forced refresh after 7 days.

Revision history for this message
Avamander (avamander) wrote :

Enabled the "refresh-app-awareness", still, without warning Chromium suddenly stops recording history, keeping state and sessions. Incredibly annoying and causes unavoidable data loss.

It's is frankly inane to roll out new snaps when it has been known for FOUR. YEARS. that there's a major bug that causes data loss.

Revision history for this message
Haw Loeung (hloeung) wrote :

Some options:

- add option to disable automated snap refreshes.

- allow configuring the forced refresh of 7 days.

- have a channel for chromium that updates less frequently.

Revision history for this message
Efthimios Chaskaris (echaskaris) wrote :

Solution(?):
Check for snap updates when you start the PC and every 6 hours I guess.
If chromium is updated, restart the system.
Be careful, new tabs won't reappear or save as bookmarks, so you should restart and not have to deal with it.

Of course, this bug needs more attention than it currently has by the developers.

Revision history for this message
Olivier Tilloy (osomon) wrote :

No, there is a much better (albeit not perfect) solution: enable experimental.refresh-app-awareness, so automatic refreshes will be held for a maximum of 7 days. So closing chromium (or any other affected snap) and applying updates is needed only once a week.

Hopefully this will become configurable, or a notification system will be implemented for snaps to alert the user that a restart is needed.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Mon, Apr 06, 2020 at 12:43:20PM -0000, Olivier Tilloy wrote:
> No, there is a much better (albeit not perfect) solution: enable
> experimental.refresh-app-awareness, so automatic refreshes will be held
> for a maximum of 7 days. So closing chromium (or any other affected
> snap) and applying updates is needed only once a week.

That's really not a satisfactory solution. I don't restart my browser
weekly unless I'm forced to, it doesn't matter if this happens one day or
seven days after the new snap has become available on the channel.

Why is the chromium snap using per-version state directories? I think
fixing that would be much more important in terms of usability.

Revision history for this message
Olivier Tilloy (osomon) wrote :

> Why is the chromium snap using per-version state directories? I think
> fixing that would be much more important in terms of usability.

That's because the file format for the profile directory can change with new versions, and is not guaranteed to be backward-compatible. Reverting to an old revision shouldn't leave users with an unusable profile directory.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Mon, Apr 06, 2020 at 04:11:03PM -0000, Olivier Tilloy wrote:
> > Why is the chromium snap using per-version state directories? I think
> > fixing that would be much more important in terms of usability.

> That's because the file format for the profile directory can change with
> new versions, and is not guaranteed to be backward-compatible. Reverting
> to an old revision shouldn't leave users with an unusable profile
> directory.

As a user, that is far less important to me than the fact that *every single
time* there is a snap refresh, Google Meet stops working, I have to restart
Chromium, it reports that Chromium did not shut down cleanly, and part of my
session state has been lost with no chance to preserve it.

I think the state should all be moved to the common dir until you have a way
to handle the rollback case without making the *upgrade* case terrible.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in chromium-browser (Ubuntu):
status: New → Confirmed
Revision history for this message
Haw Loeung (hloeung) wrote :

On Mon, Apr 06, 2020 at 06:04:27PM -0000, Steve Langasek wrote:
> On Mon, Apr 06, 2020 at 04:11:03PM -0000, Olivier Tilloy wrote:
> > > Why is the chromium snap using per-version state directories? I think
> > > fixing that would be much more important in terms of usability.
>
> > That's because the file format for the profile directory can change with
> > new versions, and is not guaranteed to be backward-compatible. Reverting
> > to an old revision shouldn't leave users with an unusable profile
> > directory.
>
> As a user, that is far less important to me than the fact that *every single
> time* there is a snap refresh, Google Meet stops working, I have to restart
> Chromium, it reports that Chromium did not shut down cleanly, and part of my
> session state has been lost with no chance to preserve it.
>
> I think the state should all be moved to the common dir until you have a way
> to handle the rollback case without making the *upgrade* case terrible.

+1 to having the state moved into a common directory.

Does the file format for the profile directory change between minor
releases? Even if so, it isn't any different to users using the .deb
and reverting or downgrading. Google Chrome users download in either
.deb or .rpm.

Thinking more about it, experimental.refresh-app-awareness, if
configurable, will likely just force users to disable automatatic
refreshes completely. Which means no updates for other snaps
installed.

Revision history for this message
Avamander (avamander) wrote :

@osomon

No, as already said, refresh-app-awareness doesn't work as a solution. I still get unannounced data loss.

Revision history for this message
Olivier Tilloy (osomon) wrote :

Zygmunt mentioned to me on IRC a few days ago that he is working on snapctl APIs for checking and getting information about updates, which he hopes will be available in a few weeks' time.

This would allow snaps such as chromium to integrate nicely with available updates and prompt the user *before* they are automatically installed.

Revision history for this message
Ricardo N Feliciano (felicianotech) wrote :

I have the exact problem with Chromium that Olivier described in #7.

I see this issue as it affects Chromium especially important in April 2020 for two reasons:

1. With the global pandemic and the rise of working from home, a stable and reliable browser is more important than ever.
2. Ubuntu 20.04 has been released. This is the first LTS with Chromium as a snap. The number of people running into this issue will greatly increase.

Revision history for this message
Darko Veberic (darko-veberic-kit) wrote :

this has become quite a nightmare. for all many years my workflow with chromium browser was to have regularly two-digit number of tabs open for a VERY LONG time, basically the browser was closed only when kernel update required a reboot. now my workflow has changed drastically: a couple of times per day i anxiously check "menu|settings|passwords" to see if the list of saved passwords is empty or not, since afaik there is no other way to detect this problem (continuing to use the browser also started to crash it recently). then i save all the open tabs in a special backup folder of my bookmarks since the "restore open tabs" on restart restores them to some outdated state that existed at the time of the snap refresh and are basically useless. then i pin the tabs i have always pinned. as you can see this is hugely annoying and i would never use chromium from snap except that i don't have this choice anymore. imho this is a very severe regression and i am starting to think about switching to firefox instead. the 7 day thing is, sorry to say, not really a solution.

Revision history for this message
Efthimios Chaskaris (echaskaris) wrote :

Suggestion: If there is an update, don't just update, show a notification if the app is running. After the user presses yes or something, the app closes (or they close it), the update happens and once it is finished you can open your app again.

Revision history for this message
Avamander (avamander) wrote :

This is still an issue. The notification is a nice rub of salt in the wound, but doesn't fix the issue. Absolutely inane, I hope whoever decided to turn chromium-browser gets a kernel panic when they're updating their system.

Revision history for this message
Olivier Tilloy (osomon) wrote :

I'm working using $SNAP_USER_COMMON to store chromium profiles, to mitigate the issue.

Changed in chromium-browser (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Olivier Tilloy (osomon)
importance: Undecided → High
Revision history for this message
Roman Odaisky (to-roma-from-lp) wrote :

This way each application has to be specifically modified to use $SNAP_USER_COMMON instead of $HOME? Why not let the applications keep using $HOME but add an option to snapcraft.yaml that would specify what exactly to set $HOME to, with options like “real home”, “versioned snap directory”, “common snap directory”? (So none of the REALHOME=$(getent passwd $(id -u) | cut -d ':' -f 6) nonsense.)

Revision history for this message
Avamander (avamander) wrote :

I opened https://bugs.launchpad.net/ubuntu/+source/chromium-browser/+bug/1887804 to address the fact that chromium-browser-snap is not following XDG base directory specification and that breaks backing up the snap's configuration and wiping it's cache properly.

Revision history for this message
Avamander (avamander) wrote :

This issue is related to it and probably both could be fixed at the same time.

Olivier Tilloy (osomon)
Changed in chromium-browser (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Chris Guiver (guiverc) wrote :

I note that chromium-browser is reported as Fix Released.

I'm getting this exactly as per https://bugs.launchpad.net/ubuntu/+source/chromium-browser/+bug/1913027 (marked duplicate of this) with chromium on hirsute. On attempts to open a new page it's just "Aw, Snap!"

and appearing in `dmesg` is

[582526.544815] traps: chrome[1187465] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]
[582535.636141] traps: chrome[1187681] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]
[582539.122134] traps: chrome[1187717] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]
[582580.221715] traps: chrome[1187796] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]
[582649.806733] traps: chrome[1187857] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]

(one line per attempt)

Revision history for this message
Olivier Tilloy (osomon) wrote :

Chris, indeed the status for chromium is misleading, since this affects potentially every single snap I'll just remove the chromium task altogether (a year ago or so the profile directory for chromium was moved to a non-versioned place in an attempt to mitigate this problem, hence the status, but it's clearly not good enough).

If you haven't done that already, I highly recommend enabling the experimental.refresh-app-awareness configuration flag (https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736).

no longer affects: chromium-browser (Ubuntu)
Revision history for this message
hackel (hackel) wrote :

This issue also causes massive data loss when using the signal-desktop (Electron/Chromium) snap. I just leave it running all the time. This time it was running for 20 days. (Why would I ever quit it?) At some point the snap refreshed in the background. The next time I opened Signal, it popped up a huge dialogue with a bunch of unicode squares I obviously could not read and the app became non-functional. I quit it and restarted, and none of the messages I had received since the refresh were saved any longer and all of my user preferences were gone.

I activated the refresh-app-awareness option after discovering it just now, but there's no way I'm going to magically remember when the last time I restarted the app was, to guess whether it had been 7 days or not and I need to manually restart it.

Snapd MUST add an option to allow manual updates, along with a warning to users to restart their apps before updating.

Revision history for this message
ghomem (gustavo) wrote (last edit ):

Quoting the comment above

"Snapd MUST add an option to allow manual updates..."

Not having this option is just WRONG design because it assumes that people need to be disciplined into doing the updates whether the moment is right or not.

For example, now LXD is delivered as a snap and unless workarounds are put in place it will update while an LXD production server is running, putting at risk whatever number (hundreds? thousands?) of containers and VMs are inside. However low you think the probability of a broken update may be, the impact could be huge and therefore the risk (probability x impact) is considerably high.

The same goes for corporate desktops running snap based apps.

Please do not assume that people in general or management teams are not capable of planning updates. Automatic updates are a good thing in *some* contexts. This is only a good feature if it can be turned on and off, depending on the case.

Essential literature:

https://popey.com/blog/2021/05/disabling-snap-autorefresh/

Revision history for this message
johannesjo (johannesjo) wrote :

My app (https://github.com/johannesjo/super-productivity) is also suffering from this. Every time the refresh is triggered the IndexedDB connection fails with any chance of recovery (apart from restarting the app). It would be great if there was some way to prevent updates to running apps.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

johannesjo, there's a conversation at https://discourse.ubuntu.com/t/feature-freeze-exception-seeding-the-official-firefox-snap-in-ubuntu-desktop/24210 with a bunch of suggestions on how to make desktop snaps friendlier for users. I hope it helps you.

Revision history for this message
johannesjo (johannesjo) wrote :

Thanks @seth-arnold !

Revision history for this message
Darko Veberic (darko-veberic-kit) wrote :

status half a year later: this annoyance actually forced me to switch completely from chromium to firefox. in fact i am grateful that this snap bug(s) exist. the work with chromium became so unbearable i had to move to firefox, which in itself has only positive sides.

nevertheless, what really got me worried today is that ubuntu plans to transition firefox to the snap-only distribution model. in 21.10 i still managed to replace the snap version of firefox with the deb package but in 22.04 lts this is not going to be possible any more.

i hope firefox developers are aware of this issue and will implement some clever detection and actually notify the user to close gracefully...?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.