Comment 58 for bug 1328727

Revision history for this message
In , matheusfillipeag (matheusfillipeag-linux-kernel-bugs) wrote :

Created attachment 282117
boot-sequence

Okay I found a way to get it working and there was also a huge mistake
on my last boot-config, the resume was commented :P
I basically followed this: https://askubuntu.com/a/1064114
but changed to:
resume=/dev/disk/by-uuid/70d967e6-ad52-4c21-baf0-01a813ccc6ac (just
the uuid wouldnt work) and this is probably the most important thing
to do.it worked!
I also set the resume variable in initramfs to my swap partition but
this might nor be so important anyway since it's automatically
detected.

I tested both systemctl hibernate and pm-hibernate, i guess they call
the same thing anyway. I attached a screenshot. Seems to be working
fine without uswsusp and with nvidia proprietary drivers!

On Wed, Apr 3, 2019 at 2:55 PM Rainer Fiebig <email address hidden> wrote:
>
> Am 03.04.19 um 18:59 schrieb Matheus Fillipe:
> > Yes I can sorta confirm the bug is in uswsusp. I removed the package
> > and pm-utils
>
> Matheus,
>
> there is no need to uninstall pm-utils. You actually need this to have
> comfortable suspend/hibernate.
>
> The only additional option you will get from uswsusp is true s2both
> (which is nice, imo).
>
> pm-utils provides something similar called "suspend-hybrid" which means
> that the computer suspends and after a configurable time wakes up again
> to go into hibernation.
>
> and used both "systemctl hibernate" and "echo disk >>
> > /sys/power/state" to hibernate. It seems to succeed and shuts down, I
> > am just not able to resume from it, which seems to be a classical
> > problem solved just by setting the resume swap file/partition on grub.
> > (which i tried and didn't work even with nvidia disabled)
> >
> > Anyway uswsusp is still necessary because the default kernel
> > hibernation doesn't work with the proprietary nvidia drivers as long
> > as I know and tested.
>
> What doesn't work: hibernating or resuming?
> And /var/log/pm-suspend.log might give you a clue what causes the problem.
>
> >
> > Is there anyway I could get any workaround to this bug on my current
> > OS by the way?
>
> *I* don't know, I don't use Ubuntu. But what I would do now is
> re-install pm-utils *without* uswsusp and make sure that you have got
> the swap-partition/file right in grub.cfg or menu.lst (grub legacy).
>
> Then do a few pm-hibernate/resume and tell us what happened.
>
> So long!
>
> >
> > On Wed, Apr 3, 2019 at 7:04 AM Rainer Fiebig <email address hidden> wrote:
> >>
> >> Am 03.04.19 um 11:34 schrieb Jan Kara:
> >>> On Tue 02-04-19 16:25:00, Andrew Morton wrote:
> >>>>
> >>>> I cc'ed a bunch of people from bugzilla.
> >>>>
> >>>> Folks, please please please remember to reply via emailed
> >>>> reply-to-all. Don't use the bugzilla interface!
> >>>>
> >>>> On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki"
> <email address hidden> wrote:
> >>>>
> >>>>> On 6/13/2014 6:55 AM, Johannes Weiner wrote:
> >>>>>> On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> >>>>>>> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >>>>>>>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>>>>>>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>>>>>>>> Hi Oliver,
> >>>>>>>>>>
> >>>>>>>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>>>>>>>> Hello,
> >>>>>>>>>>>
> >>>>>>>>>>> 1) Attached a full function-trace log + other SysRq outputs, see
> [1]
> >>>>>>>>>>> attached.
> >>>>>>>>>>>
> >>>>>>>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in
> detail
> >>>>>>>>>>> Probably more efficient when one of you guys looks directly.
> >>>>>>>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up
> the
> >>>>>>>>>> bdi_wq workqueue as it should:
> >>>>>>>>>>
> >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550413us :
> global_dirty_limits <-balance_dirty_pages_ratelimited
> >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> global_dirtyable_memory <-global_dirty_limits
> >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> writeback_in_progress <-balance_dirty_pages_ratelimited
> >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> >>>>>>>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> mod_delayed_work_on <-balance_dirty_pages_ratelimited
> >>>>>>>>>> but the worker wakeup doesn't actually do anything:
> >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us :
> finish_task_switch <-__schedule
> >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550431us :
> _raw_spin_lock_irq <-worker_thread
> >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550431us :
> need_to_create_worker <-worker_thread
> >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us :
> worker_enter_idle <-worker_thread
> >>>>>>>>>> [ 249.148009] kworker/-3466 2d... 48550432us :
> too_many_workers <-worker_enter_idle
> >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule
> <-worker_thread
> >>>>>>>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule
> <-worker_thread
> >>>>>>>>>>
> >>>>>>>>>> My suspicion is that this fails because the bdi_wq is frozen at
> this
> >>>>>>>>>> point and so the flush work never runs until resume, whereas
> before my
> >>>>>>>>>> patch the effective dirty limit was high enough so that image
> could be
> >>>>>>>>>> written in one go without being throttled; followed by an fsync()
> that
> >>>>>>>>>> then writes the pages in the context of the unfrozen s2disk.
> >>>>>>>>>>
> >>>>>>>>>> Does this make sense? Rafael? Tejun?
> >>>>>>>>> Well, it does seem to make sense to me.
> >>>>>>>> From what I see, this is a deadlock in the userspace suspend model
> and
> >>>>>>>> just happened to work by chance in the past.
> >>>>>>> Well, it had been working for quite a while, so it was a rather large
> >>>>>>> opportunity
> >>>>>>> window it seems. :-)
> >>>>>> No doubt about that, and I feel bad that it broke. But it's still a
> >>>>>> deadlock that can't reasonably be accommodated from dirty throttling.
> >>>>>>
> >>>>>> It can't just put the flushers to sleep and then issue a large amount
> >>>>>> of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
> >>>>>> the messenger, this bug needs to be addressed, not get papered over.
> >>>>>>
> >>>>>>>> Can we patch suspend-utils as follows?
> >>>>>>> Perhaps we can. Let's ask the new maintainer.
> >>>>>>>
> >>>>>>> Rodolfo, do you think you can apply the patch below to suspend-utils?
> >>>>>>>
> >>>>>>>> Alternatively, suspend-utils
> >>>>>>>> could clear the dirty limits before it starts writing and restore
> them
> >>>>>>>> post-resume.
> >>>>>>> That (and the patch too) doesn't seem to address the problem with
> existing
> >>>>>>> suspend-utils
> >>>>>>> binaries, however.
> >>>>>> It's userspace that freezes the system before issuing buffered IO, so
> >>>>>> my conclusion was that the bug is in there. This is arguable. I also
> >>>>>> wouldn't be opposed to a patch that sets the dirty limits to infinity
> >>>>>> from the ioctl that freezes the system or creates the image.
> >>>>>
> >>>>> OK, that sounds like a workable plan.
> >>>>>
> >>>>> How do I set those limits to infinity?
> >>>>
> >>>> Five years have passed and people are still hitting this.
> >>>>
> >>>> Killian described the workaround in comment 14 at
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=75101.
> >>>>
> >>>> People can use this workaround manually by hand or in scripts. But we
> >>>> really should find a proper solution. Maybe special-case the freezing
> >>>> of the flusher threads until all the writeout has completed. Or
> >>>> something else.
> >>>
> >>> I've refreshed my memory wrt this bug and I believe the bug is really on
> >>> the side of suspend-utils (uswsusp or however it is called). They are low
> >>> level system tools, they ask the kernel to freeze all processes
> >>> (SNAPSHOT_FREEZE ioctl), and then they rely on buffered writeback (which
> is
> >>> relatively heavyweight infrastructure) to work. That is wrong in my
> >>> opinion.
> >>>
> >>> I can see Johanness was suggesting in comment 11 to use O_SYNC in
> >>> suspend-utils which worked but was too slow. Indeed O_SYNC is rather big
> >>> hammer but using O_DIRECT should be what they need and get better
> >>> performance - no additional buffering in the kernel, no dirty throttling,
> >>> etc. They only need their buffer & device offsets sector aligned - they
> >>> seem to be even page aligned in suspend-utils so they should be fine. And
> >>> if the performance still sucks (currently they appear to do mostly random
> >>> 4k writes so it probably would for rotating disks), they could use AIO
> DIO
> >>> to get multiple pages in flight (as many as they dare to allocate
> buffers)
> >>> and then the IO scheduler will reorder things as good as it can and they
> >>> should get reasonable performance.
> >>>
> >>> Is there someone who works on suspend-utils these days? Because the repo
> >>> I've found on kernel.org seems to be long dead (last commit in 2012).
> >>>
> >>> Honza
> >>>
> >>
> >> Whether it's suspend-utils (or uswsusp) or not could be answered quickly
> >> by de-installing this package and using the kernel-methods instead.
> >>
> >>
>
>