On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote:
> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>Hi Oliver,
> >>>
> >>>On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>Hello,
> >>>>
> >>>>1) Attached a full function-trace log + other SysRq outputs, see [1]
> >>>>attached.
> >>>>
> >>>>I saw bdi_...() calls in the s2disk paths, but didn't check in detail
> >>>>Probably more efficient when one of you guys looks directly.
> >>>Thanks, this looks interesting. balance_dirty_pages() wakes up the
> >>>bdi_wq workqueue as it should:
> >>>
> >>>[ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits
> <-balance_dirty_pages_ratelimited
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory
> <-global_dirty_limits
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress
> <-balance_dirty_pages_ratelimited
> >>>[ 249.148009] s2disk-3327 2.... 48550414us :
> bdi_start_background_writeback <-balance_dirty_pages_ratelimited
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on
> <-balance_dirty_pages_ratelimited
> >>>but the worker wakeup doesn't actually do anything:
> >>>[ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch
> <-__schedule
> >>>[ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers
> <-worker_enter_idle
> >>>[ 249.148009] kworker/-3466 2.... 48550432us : schedule
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2.... 48550432us : __schedule
> <-worker_thread
> >>>
> >>>My suspicion is that this fails because the bdi_wq is frozen at this
> >>>point and so the flush work never runs until resume, whereas before my
> >>>patch the effective dirty limit was high enough so that image could be
> >>>written in one go without being throttled; followed by an fsync() that
> >>>then writes the pages in the context of the unfrozen s2disk.
> >>>
> >>>Does this make sense? Rafael? Tejun?
> >>Well, it does seem to make sense to me.
> > From what I see, this is a deadlock in the userspace suspend model and
> >just happened to work by chance in the past.
>
> Well, it had been working for quite a while, so it was a rather large
> opportunity
> window it seems. :-)
No doubt about that, and I feel bad that it broke. But it's still a
deadlock that can't reasonably be accommodated from dirty throttling.
It can't just put the flushers to sleep and then issue a large amount
of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
the messenger, this bug needs to be addressed, not get papered over.
> >Can we patch suspend-utils as follows?
>
> Perhaps we can. Let's ask the new maintainer.
>
> Rodolfo, do you think you can apply the patch below to suspend-utils?
>
> >Alternatively, suspend-utils
> >could clear the dirty limits before it starts writing and restore them
> >post-resume.
>
> That (and the patch too) doesn't seem to address the problem with existing
> suspend-utils
> binaries, however.
It's userspace that freezes the system before issuing buffered IO, so
my conclusion was that the bug is in there. This is arguable. I also
wouldn't be opposed to a patch that sets the dirty limits to infinity
from the ioctl that freezes the system or creates the image.
On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: dirty_pages( ) wakes up the dirty_pages_ ratelimited dirtyable_ memory dirty_limits in_progress dirty_pages_ ratelimited background_ writeback <-balance_ dirty_pages_ ratelimited dirty_pages_ ratelimited create_ worker
> On 6/13/2014 12:02 AM, Johannes Weiner wrote:
> >On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote:
> >>On 5/6/2014 1:33 AM, Johannes Weiner wrote:
> >>>Hi Oliver,
> >>>
> >>>On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote:
> >>>>Hello,
> >>>>
> >>>>1) Attached a full function-trace log + other SysRq outputs, see [1]
> >>>>attached.
> >>>>
> >>>>I saw bdi_...() calls in the s2disk paths, but didn't check in detail
> >>>>Probably more efficient when one of you guys looks directly.
> >>>Thanks, this looks interesting. balance_
> >>>bdi_wq workqueue as it should:
> >>>
> >>>[ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits
> <-balance_
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : global_
> <-global_
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : writeback_
> <-balance_
> >>>[ 249.148009] s2disk-3327 2.... 48550414us :
> bdi_start_
> >>>[ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on
> <-balance_
> >>>but the worker wakeup doesn't actually do anything:
> >>>[ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch
> <-__schedule
> >>>[ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550431us : need_to_
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers
> <-worker_enter_idle
> >>>[ 249.148009] kworker/-3466 2.... 48550432us : schedule
> <-worker_thread
> >>>[ 249.148009] kworker/-3466 2.... 48550432us : __schedule
> <-worker_thread
> >>>
> >>>My suspicion is that this fails because the bdi_wq is frozen at this
> >>>point and so the flush work never runs until resume, whereas before my
> >>>patch the effective dirty limit was high enough so that image could be
> >>>written in one go without being throttled; followed by an fsync() that
> >>>then writes the pages in the context of the unfrozen s2disk.
> >>>
> >>>Does this make sense? Rafael? Tejun?
> >>Well, it does seem to make sense to me.
> > From what I see, this is a deadlock in the userspace suspend model and
> >just happened to work by chance in the past.
>
> Well, it had been working for quite a while, so it was a rather large
> opportunity
> window it seems. :-)
No doubt about that, and I feel bad that it broke. But it's still a
deadlock that can't reasonably be accommodated from dirty throttling.
It can't just put the flushers to sleep and then issue a large amount
of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot
the messenger, this bug needs to be addressed, not get papered over.
> >Can we patch suspend-utils as follows?
>
> Perhaps we can. Let's ask the new maintainer.
>
> Rodolfo, do you think you can apply the patch below to suspend-utils?
>
> >Alternatively, suspend-utils
> >could clear the dirty limits before it starts writing and restore them
> >post-resume.
>
> That (and the patch too) doesn't seem to address the problem with existing
> suspend-utils
> binaries, however.
It's userspace that freezes the system before issuing buffered IO, so
my conclusion was that the bug is in there. This is arguable. I also
wouldn't be opposed to a patch that sets the dirty limits to infinity
from the ioctl that freezes the system or creates the image.