Bug #1616650 “snap refresh while command is running may cause is...” : Bugs : snapd

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2016-08-24:

#1

Perhaps this was already in the design of the state engine? Eg, the state engine will periodically try to uninstall an app if it is unable to unmount the squashfs. I wonder if the security policy load (or even install) could be deferred until there were no processes running under that security label (easy to determine by examining /proc (see ps -Z).

description:

updated

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2016-09-30:

#2

I had this happen again. The snap refreshed while I was offline and I ended up with 890531 sandbox denials and syslog at 288M before I came back online and stopped the snap in question. Marking confirmed.

Changed in snappy:
status:	New → Confirmed

Revision history for this message

Michael Vogt (mvo) wrote on 2016-11-29:

#3

This should be fixed now. We do a lazy unmount of mounted snaps now, so anythign still running will get removed in a delayed fashion.

Changed in snappy:
status:	Confirmed → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-01-29:

#4

[Expired for Snappy because there has been no activity for 60 days.]

Changed in snappy:
status:	Incomplete → Expired

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2017-12-18:

#5

@mvo - you marked this as Incomplete stating "This should be fixed now. We do a lazy unmount of mounted snaps now, so anythign still running will get removed in a delayed fashion." but that is not what this bug is talking about. This bug is talking about the fact that a running application's environment points to versioned data directories that when started, the application has write access to, but after a refresh does not because the application is not restarted after the refresh.

See https://forum.snapcraft.io/t/bug-saves-are-blocked-to-snap-user-data-if-snap-updates-when-it-is-already-running/3226/1

What we should probably do during a refresh is look in the freezer cgroup to see is there are any non-daemon running processes (daemons are already handled due to systemd unit restarts). If so, delay the refresh (perhaps with pop up allowing the user to stop the application).

affects:	snappy → snapd
Changed in snapd:
status:	Expired → Confirmed

Michael Vogt (mvo) on 2018-01-02

Changed in snapd:
status:	Confirmed → Triaged
importance:	Undecided → Medium

Revision history for this message

Ernst Sjöstrand (ernstp) wrote on 2018-05-21:

#6

My workstation usually have many weeks uptime, and I leave my editors open so I can just pick up where I left them.
My editors happens to be snaps like Atom, Intellij, Pycharm etc.
So I run into this bug very frequently.

Zygmunt Krynicki (zyga) on 2018-07-19

Changed in snapd:
assignee:	nobody → Zygmunt Krynicki (zyga)

Revision history for this message

Olivier Tilloy (osomon) wrote on 2018-09-17:

#7

This is especially visible with the chromium snap. As described by Jamie, if the chromium snap is refreshed while running, at the next restart the application will complain that it wasn't shut down properly. Also, the GNOME dock looses track of the running application when the refresh is done.

This is regularly being reported by users on forums.

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2018-12-07:

#8

I'm starting to work on this feature.

The general idea is that a package will only refresh when there are no processes running. I will document the feature separately on the forum and link it back here when ready. The feature will be based on the existing freezer cgroup that snapd manages for each snap.

Changed in snapd:
status:	Triaged → In Progress

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2018-12-11:

#9

I've shared a draft idea on how to implement this feature on the forum: https://forum.snapcraft.io/t/bug-saves-are-blocked-to-snap-user-data-if-snap-updates-when-it-is-already-running/3226/19

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-01-30:

#10

We've re-designed the feature to be much simpler to understand.

<<<__DOC__

Simple approach first, covers the most common desktop case:

When user requests a refresh interactively we perform synchronous server side soft verification that the request can go forward, for details about the verification, see below. If verification fails we return a synchronous error response, otherwise we create the usual change and return the async response
In the refresh chain, before stopping services we perform another soft verification. If the verification fails but is below the grace period of postponed refreshes we fail the change (due to lanes only the affected snaps will fail to refresh). If the last refresh time is no longer in the grace period we remember to kill all processes and carry on.
After stopping services we perform hard verification, if that fails but we are still within the grace period we restart services we’ve stopped and fail the change, as above. If the grace period has elapsed we kill all processes belonging to the snap and proceed with the refresh as usual.

Soft verification - the set of processes belonging to non-service applications is non-empty
Hard verification - the set of processes belonging to a given snap is non-empty

We can compute those sets by examining our freezer cgroup process list (/sys/fs/cgroup/freezer/snap.$SNAP_NAME/cgroup.procs) and set of processes belonging to all the services that exist in the snap (by looking at /sys/fs/cgroup/systemd/system.slice/snap.$SNAP_NAME.*.service/cgroup.procs)

Once simple approach is implemented, we can consider several improvements.

- We can initiate the refresh process instantly after the last application process terminates using cgroup v1 or v2 notification mechanism.
- We can introduce new hooks that notify an application about a pending update
- We can introduce session-level hooks via snapd and snap-userd to deliver messages to the session of users that have logged in
- We can pre-download the snap and perform the update in a special boot mode, matching similar work on recent desktop and server systems.
- A mechanism that allows applications to grab refresh inhibit locks for critical operations for a bound amount of time (independent of the logic above)

__DOC__

We've re-designed the feature to be much simpler to understand.

<<<__DOC__

Simple approach first, covers the most common desktop case:

When user requests a refresh interactively we perform synchronous server side soft verification that the request can go forward, for details about the verification, see below. If verification fails we return a synchronous error response, otherwise we create the usual change and return the async response
In the refresh chain, before stopping services we perform another soft verification. If the verification fails but is below the grace period of postponed refreshes we fail the change (due to lanes only the affected snaps will fail to refresh). If the last refresh time is no longer in the grace period we remember to kill all processes and carry on.
After stopping services we perform hard verification, if that fails but we are still within the grace period we restart services we’ve stopped and fail the change, as above. If the grace period has elapsed we kill all processes belonging to the snap and proceed with the refresh as usual.

Soft verification - the set of processes belonging to non-service applications is non-empty
Hard verification - the set of processes belonging to a given snap is non-empty

We can compute those sets by examining our freezer cgroup process list (/sys/fs/cgroup/freezer/snap.$SNAP_NAME/cgroup.procs)  and set of processes belonging to all the services that exist in the snap (by looking at /sys/fs/cgroup/systemd/system.slice/snap.$SNAP_NAME.*.service/cgroup.procs)

Once simple approach is implemented, we can consider several improvements.

- We can initiate the refresh process instantly after the last application process terminates using cgroup v1 or v2 notification mechanism.
- We can introduce new hooks that notify an application about a pending update
- We can introduce session-level hooks via snapd and snap-userd to deliver messages to the session of users that have logged in
- We can pre-download the snap and perform the update in a special boot mode, matching similar work on recent desktop and server systems.
- A mechanism that allows applications to grab refresh inhibit locks for critical operations for a bound amount of time (independent of the logic above)

__DOC__

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-02-14:

#11

I've targeted this to 2.39 where it should be available behind a feature flag. It may be available earlier but 2.38 is likely to branch for release soon so I think that's unrealistic.

Changed in snapd:
milestone:	none → 2.39

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-03-25:

#12

This is progressing nicely. I think 2.39 will have it behind a feature flag with 80% certainty now.

Revision history for this message

Olivier Tilloy (osomon) wrote on 2019-07-31:

#13

Zygmunt, what's the latest status on this bug?

I'm running snapd 2.39.3 and still seeing this issue when the chromium snap is refreshed under my feet.

Revision history for this message

Olivier Tilloy (osomon) wrote on 2019-07-31:

#14

Ah, nevermind, I just found https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736.
Going to test the feature flag right away.

Revision history for this message

Efthimios Chaskaris (echaskaris) wrote on 2019-09-12:

#15

I experienced this issue with chrome again, after entering the command to enable the flag in console, from the snapcraft forum. I checked and the flag is "true".

Revision history for this message

Simon Déziel (sdeziel) wrote on 2019-10-11:

#16

This happened to me (again!) with Chromium that updated/vanished on me while I was working on a important document. I'm running snapd 2.40+18.04.

Revision history for this message

Simon Déziel (sdeziel) wrote on 2019-10-11:

#17

@echaskaris, fortunately with Chromium (maybe Chrome too?) there is a way to recover your session that vanished after the unexpected refresh. You need to revert to the previous snap (sudo snap revert chromium), start a Chromium, go to the "hamburger" menu, "More tools" then "Task manager". There, you can double click on one of your tabs to bring back the old window/session.

Revision history for this message

Efthimios Chaskaris (echaskaris) wrote on 2019-10-11:

#18

I'm not losing my session, just some aspects get weird (losing bookmark thumbnails, password functionality, it's like the session never happened)

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-10-29:

#19

This is still in progress and is not released yet. I'll focus my next two weeks on advancing a solution to this issue immensely.

Changed in snapd:
milestone:	2.39 → none

Revision history for this message

Haw Loeung (hloeung) wrote on 2020-03-05:

#20

With experimental.refresh-app-awareness enabled, it still updated:

| $ sudo snap get core experimental
| Key Value
| experimental.refresh-app-awareness true

| $ snap changes
| ID Status Spawn Ready Summary
| 401 Done yesterday at 07:40 AEDT yesterday at 07:40 AEDT Auto-refresh snap "openstackclients"
| 402 Done yesterday at 16:55 AEDT yesterday at 16:55 AEDT Auto-refresh snap "openstackclients"
| 403 Done yesterday at 18:35 AEDT yesterday at 18:35 AEDT Auto-refresh snap "openstackclients"
| 404 Done today at 13:27 AEDT today at 13:28 AEDT Auto-refresh snaps "chromium", "openstackclients"

Revision history for this message

Olivier Tilloy (osomon) wrote on 2020-03-05:

#21

Haw, the current implementations forces a refresh after 7 days (see https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736/23), could it be that you had had chromium running for longer than that?

Revision history for this message

Alexis Wilke (alexis-m2osw) wrote on 2020-03-05:

#22

Last time I rebooted, I noticed that Chromium again did not save the latest set of tabs. In my case, I can say that it had been running for a very long time (way more than 7 days, probably around 20 to 30 days). That being said, the little dot under the icon in Gnome did not go away. So there is that.

Revision history for this message

Haw Loeung (hloeung) wrote on 2020-03-07: Re: [Bug 1616650] Re: snap refresh while command is running may cause issues

#23

On Thu, Mar 05, 2020 at 08:06:11AM -0000, Olivier Tilloy wrote:
> Haw, the current implementations forces a refresh after 7 days (see
> https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736/23), could
> it be that you had had chromium running for longer than that?
>

Ah yes, that would be it. Didn't realise that even with
refresh-app-awareness, there's a forced refresh after 7 days.

Revision history for this message

Avamander (avamander) wrote on 2020-04-05:

#24

Enabled the "refresh-app-awareness", still, without warning Chromium suddenly stops recording history, keeping state and sessions. Incredibly annoying and causes unavoidable data loss.

It's is frankly inane to roll out new snaps when it has been known for FOUR. YEARS. that there's a major bug that causes data loss.

Revision history for this message

Haw Loeung (hloeung) wrote on 2020-04-06:

#25

Some options:

- add option to disable automated snap refreshes.

- allow configuring the forced refresh of 7 days.

- have a channel for chromium that updates less frequently.

Revision history for this message

Efthimios Chaskaris (echaskaris) wrote on 2020-04-06:

#26

Solution(?):
Check for snap updates when you start the PC and every 6 hours I guess.
If chromium is updated, restart the system.
Be careful, new tabs won't reappear or save as bookmarks, so you should restart and not have to deal with it.

Of course, this bug needs more attention than it currently has by the developers.

Revision history for this message

Olivier Tilloy (osomon) wrote on 2020-04-06:

#27

No, there is a much better (albeit not perfect) solution: enable experimental.refresh-app-awareness, so automatic refreshes will be held for a maximum of 7 days. So closing chromium (or any other affected snap) and applying updates is needed only once a week.

Hopefully this will become configurable, or a notification system will be implemented for snaps to alert the user that a restart is needed.

Revision history for this message

Steve Langasek (vorlon) wrote on 2020-04-06:

#28

On Mon, Apr 06, 2020 at 12:43:20PM -0000, Olivier Tilloy wrote:
> No, there is a much better (albeit not perfect) solution: enable
> experimental.refresh-app-awareness, so automatic refreshes will be held
> for a maximum of 7 days. So closing chromium (or any other affected
> snap) and applying updates is needed only once a week.

That's really not a satisfactory solution. I don't restart my browser
weekly unless I'm forced to, it doesn't matter if this happens one day or
seven days after the new snap has become available on the channel.

Why is the chromium snap using per-version state directories? I think
fixing that would be much more important in terms of usability.

Revision history for this message

Olivier Tilloy (osomon) wrote on 2020-04-06:

#29

> Why is the chromium snap using per-version state directories? I think
> fixing that would be much more important in terms of usability.

That's because the file format for the profile directory can change with new versions, and is not guaranteed to be backward-compatible. Reverting to an old revision shouldn't leave users with an unusable profile directory.

Revision history for this message

Steve Langasek (vorlon) wrote on 2020-04-06:

#30

On Mon, Apr 06, 2020 at 04:11:03PM -0000, Olivier Tilloy wrote:
> > Why is the chromium snap using per-version state directories? I think
> > fixing that would be much more important in terms of usability.

> That's because the file format for the profile directory can change with
> new versions, and is not guaranteed to be backward-compatible. Reverting
> to an old revision shouldn't leave users with an unusable profile
> directory.

As a user, that is far less important to me than the fact that *every single
time* there is a snap refresh, Google Meet stops working, I have to restart
Chromium, it reports that Chromium did not shut down cleanly, and part of my
session state has been lost with no chance to preserve it.

I think the state should all be moved to the common dir until you have a way
to handle the rollback case without making the *upgrade* case terrible.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-04-07:

#31

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in chromium-browser (Ubuntu):
status:	New → Confirmed

Revision history for this message

Haw Loeung (hloeung) wrote on 2020-04-08:

#32

On Mon, Apr 06, 2020 at 06:04:27PM -0000, Steve Langasek wrote:
> On Mon, Apr 06, 2020 at 04:11:03PM -0000, Olivier Tilloy wrote:
> > > Why is the chromium snap using per-version state directories? I think
> > > fixing that would be much more important in terms of usability.
>
> > That's because the file format for the profile directory can change with
> > new versions, and is not guaranteed to be backward-compatible. Reverting
> > to an old revision shouldn't leave users with an unusable profile
> > directory.
>
> As a user, that is far less important to me than the fact that *every single
> time* there is a snap refresh, Google Meet stops working, I have to restart
> Chromium, it reports that Chromium did not shut down cleanly, and part of my
> session state has been lost with no chance to preserve it.
>
> I think the state should all be moved to the common dir until you have a way
> to handle the rollback case without making the *upgrade* case terrible.

+1 to having the state moved into a common directory.

Does the file format for the profile directory change between minor
releases? Even if so, it isn't any different to users using the .deb
and reverting or downgrading. Google Chrome users download in either
.deb or .rpm.

Thinking more about it, experimental.refresh-app-awareness, if
configurable, will likely just force users to disable automatatic
refreshes completely. Which means no updates for other snaps
installed.

Revision history for this message

Avamander (avamander) wrote on 2020-04-10:

#33

@osomon

No, as already said, refresh-app-awareness doesn't work as a solution. I still get unannounced data loss.

Revision history for this message

Olivier Tilloy (osomon) wrote on 2020-04-22:

#34

Zygmunt mentioned to me on IRC a few days ago that he is working on snapctl APIs for checking and getting information about updates, which he hopes will be available in a few weeks' time.

This would allow snaps such as chromium to integrate nicely with available updates and prompt the user *before* they are automatically installed.

Revision history for this message

Ricardo N Feliciano (felicianotech) wrote on 2020-04-29:

#35

I have the exact problem with Chromium that Olivier described in #7.

I see this issue as it affects Chromium especially important in April 2020 for two reasons:

1. With the global pandemic and the rise of working from home, a stable and reliable browser is more important than ever.
2. Ubuntu 20.04 has been released. This is the first LTS with Chromium as a snap. The number of people running into this issue will greatly increase.

Revision history for this message

Darko Veberic (darko-veberic-kit) wrote on 2020-04-30:

#36

this has become quite a nightmare. for all many years my workflow with chromium browser was to have regularly two-digit number of tabs open for a VERY LONG time, basically the browser was closed only when kernel update required a reboot. now my workflow has changed drastically: a couple of times per day i anxiously check "menu|settings|passwords" to see if the list of saved passwords is empty or not, since afaik there is no other way to detect this problem (continuing to use the browser also started to crash it recently). then i save all the open tabs in a special backup folder of my bookmarks since the "restore open tabs" on restart restores them to some outdated state that existed at the time of the snap refresh and are basically useless. then i pin the tabs i have always pinned. as you can see this is hugely annoying and i would never use chromium from snap except that i don't have this choice anymore. imho this is a very severe regression and i am starting to think about switching to firefox instead. the 7 day thing is, sorry to say, not really a solution.

Revision history for this message

Efthimios Chaskaris (echaskaris) wrote on 2020-06-11:

#37

Suggestion: If there is an update, don't just update, show a notification if the app is running. After the user presses yes or something, the app closes (or they close it), the update happens and once it is finished you can open your app again.

Revision history for this message

Avamander (avamander) wrote on 2020-06-14:

#38

This is still an issue. The notification is a nice rub of salt in the wound, but doesn't fix the issue. Absolutely inane, I hope whoever decided to turn chromium-browser gets a kernel panic when they're updating their system.

Revision history for this message

Olivier Tilloy (osomon) wrote on 2020-06-17:

#39

I'm working using $SNAP_USER_COMMON to store chromium profiles, to mitigate the issue.

Changed in chromium-browser (Ubuntu):
status:	Confirmed → In Progress
assignee:	nobody → Olivier Tilloy (osomon)
importance:	Undecided → High

Revision history for this message

Roman Odaisky (to-roma-from-lp) wrote on 2020-06-18:

#40

This way each application has to be specifically modified to use $SNAP_USER_COMMON instead of $HOME? Why not let the applications keep using $HOME but add an option to snapcraft.yaml that would specify what exactly to set $HOME to, with options like “real home”, “versioned snap directory”, “common snap directory”? (So none of the REALHOME=$(getent passwd $(id -u) | cut -d ':' -f 6) nonsense.)

Revision history for this message

Avamander (avamander) wrote on 2020-07-16:

#41

I opened https://bugs.launchpad.net/ubuntu/+source/chromium-browser/+bug/1887804 to address the fact that chromium-browser-snap is not following XDG base directory specification and that breaks backing up the snap's configuration and wiping it's cache properly.

Revision history for this message

Avamander (avamander) wrote on 2020-07-16:

#42

This issue is related to it and probably both could be fixed at the same time.

Olivier Tilloy (osomon) on 2020-10-07

Changed in chromium-browser (Ubuntu):
status:	In Progress → Fix Released

Revision history for this message

Chris Guiver (guiverc) wrote on 2021-03-24:

#43

I note that chromium-browser is reported as Fix Released.

I'm getting this exactly as per https://bugs.launchpad.net/ubuntu/+source/chromium-browser/+bug/1913027 (marked duplicate of this) with chromium on hirsute. On attempts to open a new page it's just "Aw, Snap!"

and appearing in `dmesg` is

[582526.544815] traps: chrome[1187465] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]
[582535.636141] traps: chrome[1187681] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]
[582539.122134] traps: chrome[1187717] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]
[582580.221715] traps: chrome[1187796] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]
[582649.806733] traps: chrome[1187857] trap int3 ip:5600a3f10a9e sp:7fff6b3c48d0 error:0 in chrome[56009f28d000+781d000]

(one line per attempt)

Revision history for this message

Olivier Tilloy (osomon) wrote on 2021-03-25:

#44

Chris, indeed the status for chromium is misleading, since this affects potentially every single snap I'll just remove the chromium task altogether (a year ago or so the profile directory for chromium was moved to a non-versioned place in an attempt to mitigate this problem, hence the status, but it's clearly not good enough).

If you haven't done that already, I highly recommend enabling the experimental.refresh-app-awareness configuration flag (https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736).

no longer affects:

chromium-browser (Ubuntu)

Revision history for this message

hackel (hackel) wrote on 2021-06-24:

#45

This issue also causes massive data loss when using the signal-desktop (Electron/Chromium) snap. I just leave it running all the time. This time it was running for 20 days. (Why would I ever quit it?) At some point the snap refreshed in the background. The next time I opened Signal, it popped up a huge dialogue with a bunch of unicode squares I obviously could not read and the app became non-functional. I quit it and restarted, and none of the messages I had received since the refresh were saved any longer and all of my user preferences were gone.

I activated the refresh-app-awareness option after discovering it just now, but there's no way I'm going to magically remember when the last time I restarted the app was, to guess whether it had been 7 days or not and I need to manually restart it.

Snapd MUST add an option to allow manual updates, along with a warning to users to restart their apps before updating.

Revision history for this message

ghomem (gustavo) wrote on 2021-08-18 (last edit on 2021-08-18):

#46

Quoting the comment above

"Snapd MUST add an option to allow manual updates..."

Not having this option is just WRONG design because it assumes that people need to be disciplined into doing the updates whether the moment is right or not.

For example, now LXD is delivered as a snap and unless workarounds are put in place it will update while an LXD production server is running, putting at risk whatever number (hundreds? thousands?) of containers and VMs are inside. However low you think the probability of a broken update may be, the impact could be huge and therefore the risk (probability x impact) is considerably high.

The same goes for corporate desktops running snap based apps.

Please do not assume that people in general or management teams are not capable of planning updates. Automatic updates are a good thing in *some* contexts. This is only a good feature if it can be turned on and off, depending on the case.

Essential literature:

https://popey.com/blog/2021/05/disabling-snap-autorefresh/

Revision history for this message

johannesjo (johannesjo) wrote on 2021-10-13:

#47

My app (https://github.com/johannesjo/super-productivity) is also suffering from this. Every time the refresh is triggered the IndexedDB connection fails with any chance of recovery (apart from restarting the app). It would be great if there was some way to prevent updates to running apps.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2021-10-13:

#48

johannesjo, there's a conversation at https://discourse.ubuntu.com/t/feature-freeze-exception-seeding-the-official-firefox-snap-in-ubuntu-desktop/24210 with a bunch of suggestions on how to make desktop snaps friendlier for users. I hope it helps you.

Revision history for this message

johannesjo (johannesjo) wrote on 2021-10-15:

#49

Thanks @seth-arnold !

Revision history for this message

Darko Veberic (darko-veberic-kit) wrote on 2021-10-25:

#50

status half a year later: this annoyance actually forced me to switch completely from chromium to firefox. in fact i am grateful that this snap bug(s) exist. the work with chromium became so unbearable i had to move to firefox, which in itself has only positive sides.

nevertheless, what really got me worried today is that ubuntu plans to transition firefox to the snap-only distribution model. in 21.10 i still managed to replace the snap version of firefox with the deb package but in 22.04 lts this is not going to be possible any more.

i hope firefox developers are aware of this issue and will implement some clever detection and actually notify the user to close gracefully...?

snapd

snap refresh while command is running may cause issues

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches