Folks, please please please remember to reply via emailed
reply-to-all. Don't use the bugzilla interface!
On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <email address hidden> wrote:
> On 6/13/2014 6:55 AM, Johannes Weiner wrote:
> > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> >> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>>> Hi Oliver,
> >>>>>
> >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1]
> >>>>>> attached.
> >>>>>>
> >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail
> >>>>>> Probably more efficient when one of you guys looks directly.
> >>>>> Thanks, this looks interesting. balance_dirty_pages() wakes up the
> >>>>> bdi_wq workqueue as it should:
> >>>>>
> >>>>> [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits
> <-balance_dirty_pages_ratelimited
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> global_dirtyable_memory <-global_dirty_limits
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> writeback_in_progress <-balance_dirty_pages_ratelimited
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on
> <-balance_dirty_pages_ratelimited
> >>>>> but the worker wakeup doesn't actually do anything:
> >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch
> <-__schedule
> >>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2d... 48550431us :
> need_to_create_worker <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers
> <-worker_enter_idle
> >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule
> <-worker_thread
> >>>>>
> >>>>> My suspicion is that this fails because the bdi_wq is frozen at this
> >>>>> point and so the flush work never runs until resume, whereas before my
> >>>>> patch the effective dirty limit was high enough so that image could be
> >>>>> written in one go without being throttled; followed by an fsync() that
> >>>>> then writes the pages in the context of the unfrozen s2disk.
> >>>>>
> >>>>> Does this make sense? Rafael? Tejun?
> >>>> Well, it does seem to make sense to me.
> >>> From what I see, this is a deadlock in the userspace suspend model and
> >>> just happened to work by chance in the past.
> >> Well, it had been working for quite a while, so it was a rather large
> >> opportunity
> >> window it seems. :-)
> > No doubt about that, and I feel bad that it broke. But it's still a
> > deadlock that can't reasonably be accommodated from dirty throttling.
> >
> > It can't just put the flushers to sleep and then issue a large amount
> > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
> > the messenger, this bug needs to be addressed, not get papered over.
> >
> >>> Can we patch suspend-utils as follows?
> >> Perhaps we can. Let's ask the new maintainer.
> >>
> >> Rodolfo, do you think you can apply the patch below to suspend-utils?
> >>
> >>> Alternatively, suspend-utils
> >>> could clear the dirty limits before it starts writing and restore them
> >>> post-resume.
> >> That (and the patch too) doesn't seem to address the problem with existing
> >> suspend-utils
> >> binaries, however.
> > It's userspace that freezes the system before issuing buffered IO, so
> > my conclusion was that the bug is in there. This is arguable. I also
> > wouldn't be opposed to a patch that sets the dirty limits to infinity
> > from the ioctl that freezes the system or creates the image.
>
> OK, that sounds like a workable plan.
>
> How do I set those limits to infinity?
Five years have passed and people are still hitting this.
People can use this workaround manually by hand or in scripts. But we
really should find a proper solution. Maybe special-case the freezing
of the flusher threads until all the writeout has completed. Or
something else.
I cc'ed a bunch of people from bugzilla.
Folks, please please please remember to reply via emailed
reply-to-all. Don't use the bugzilla interface!
On Mon, 16 Jun 2014 18:29:26 +0200 "Rafael J. Wysocki" <email address hidden> wrote:
> On 6/13/2014 6:55 AM, Johannes Weiner wrote: dirty_pages( ) wakes up the dirty_pages_ ratelimited dirtyable_ memory <-global_ dirty_limits in_progress <-balance_ dirty_pages_ ratelimited background_ writeback <-balance_ dirty_pages_ ratelimited dirty_pages_ ratelimited create_ worker <-worker_thread
> > On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> >> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >>> On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>>> On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>>> Hi Oliver,
> >>>>>
> >>>>> On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> 1) Attached a full function-trace log + other SysRq outputs, see [1]
> >>>>>> attached.
> >>>>>>
> >>>>>> I saw bdi_...() calls in the s2disk paths, but didn't check in detail
> >>>>>> Probably more efficient when one of you guys looks directly.
> >>>>> Thanks, this looks interesting. balance_
> >>>>> bdi_wq workqueue as it should:
> >>>>>
> >>>>> [ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits
> <-balance_
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> global_
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> writeback_
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us :
> bdi_start_
> >>>>> [ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on
> <-balance_
> >>>>> but the worker wakeup doesn't actually do anything:
> >>>>> [ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch
> <-__schedule
> >>>>> [ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2d... 48550431us :
> need_to_
> >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers
> <-worker_enter_idle
> >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : schedule
> <-worker_thread
> >>>>> [ 249.148009] kworker/-3466 2.... 48550432us : __schedule
> <-worker_thread
> >>>>>
> >>>>> My suspicion is that this fails because the bdi_wq is frozen at this
> >>>>> point and so the flush work never runs until resume, whereas before my
> >>>>> patch the effective dirty limit was high enough so that image could be
> >>>>> written in one go without being throttled; followed by an fsync() that
> >>>>> then writes the pages in the context of the unfrozen s2disk.
> >>>>>
> >>>>> Does this make sense? Rafael? Tejun?
> >>>> Well, it does seem to make sense to me.
> >>> From what I see, this is a deadlock in the userspace suspend model and
> >>> just happened to work by chance in the past.
> >> Well, it had been working for quite a while, so it was a rather large
> >> opportunity
> >> window it seems. :-)
> > No doubt about that, and I feel bad that it broke. But it's still a
> > deadlock that can't reasonably be accommodated from dirty throttling.
> >
> > It can't just put the flushers to sleep and then issue a large amount
> > of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
> > the messenger, this bug needs to be addressed, not get papered over.
> >
> >>> Can we patch suspend-utils as follows?
> >> Perhaps we can. Let's ask the new maintainer.
> >>
> >> Rodolfo, do you think you can apply the patch below to suspend-utils?
> >>
> >>> Alternatively, suspend-utils
> >>> could clear the dirty limits before it starts writing and restore them
> >>> post-resume.
> >> That (and the patch too) doesn't seem to address the problem with existing
> >> suspend-utils
> >> binaries, however.
> > It's userspace that freezes the system before issuing buffered IO, so
> > my conclusion was that the bug is in there. This is arguable. I also
> > wouldn't be opposed to a patch that sets the dirty limits to infinity
> > from the ioctl that freezes the system or creates the image.
>
> OK, that sounds like a workable plan.
>
> How do I set those limits to infinity?
Five years have passed and people are still hitting this.
Killian described the workaround in comment 14 at /bugzilla. kernel. org/show_ bug.cgi? id=75101.
https:/
People can use this workaround manually by hand or in scripts. But we
really should find a proper solution. Maybe special-case the freezing
of the flusher threads until all the writeout has completed. Or
something else.