On Fri, Jun 13, 2014 at 01:50:47AM +0200, Rafael J. Wysocki wrote: > On 6/13/2014 12:02 AM, Johannes Weiner wrote: > >On Tue, May 06, 2014 at 01:45:01AM +0200, Rafael J. Wysocki wrote: > >>On 5/6/2014 1:33 AM, Johannes Weiner wrote: > >>>Hi Oliver, > >>> > >>>On Mon, May 05, 2014 at 11:00:13PM +0200, Oliver Winker wrote: > >>>>Hello, > >>>> > >>>>1) Attached a full function-trace log + other SysRq outputs, see [1] > >>>>attached. > >>>> > >>>>I saw bdi_...() calls in the s2disk paths, but didn't check in detail > >>>>Probably more efficient when one of you guys looks directly. > >>>Thanks, this looks interesting. balance_dirty_pages() wakes up the > >>>bdi_wq workqueue as it should: > >>> > >>>[ 249.148009] s2disk-3327 2.... 48550413us : global_dirty_limits > <-balance_dirty_pages_ratelimited > >>>[ 249.148009] s2disk-3327 2.... 48550414us : global_dirtyable_memory > <-global_dirty_limits > >>>[ 249.148009] s2disk-3327 2.... 48550414us : writeback_in_progress > <-balance_dirty_pages_ratelimited > >>>[ 249.148009] s2disk-3327 2.... 48550414us : > bdi_start_background_writeback <-balance_dirty_pages_ratelimited > >>>[ 249.148009] s2disk-3327 2.... 48550414us : mod_delayed_work_on > <-balance_dirty_pages_ratelimited > >>>but the worker wakeup doesn't actually do anything: > >>>[ 249.148009] kworker/-3466 2d... 48550431us : finish_task_switch > <-__schedule > >>>[ 249.148009] kworker/-3466 2.... 48550431us : _raw_spin_lock_irq > <-worker_thread > >>>[ 249.148009] kworker/-3466 2d... 48550431us : need_to_create_worker > <-worker_thread > >>>[ 249.148009] kworker/-3466 2d... 48550432us : worker_enter_idle > <-worker_thread > >>>[ 249.148009] kworker/-3466 2d... 48550432us : too_many_workers > <-worker_enter_idle > >>>[ 249.148009] kworker/-3466 2.... 48550432us : schedule > <-worker_thread > >>>[ 249.148009] kworker/-3466 2.... 48550432us : __schedule > <-worker_thread > >>> > >>>My suspicion is that this fails because the bdi_wq is frozen at this > >>>point and so the flush work never runs until resume, whereas before my > >>>patch the effective dirty limit was high enough so that image could be > >>>written in one go without being throttled; followed by an fsync() that > >>>then writes the pages in the context of the unfrozen s2disk. > >>> > >>>Does this make sense? Rafael? Tejun? > >>Well, it does seem to make sense to me. > > From what I see, this is a deadlock in the userspace suspend model and > >just happened to work by chance in the past. > > Well, it had been working for quite a while, so it was a rather large > opportunity > window it seems. :-) No doubt about that, and I feel bad that it broke. But it's still a deadlock that can't reasonably be accommodated from dirty throttling. It can't just put the flushers to sleep and then issue a large amount of buffered IO, hoping it doesn't hit the dirty limits. Don't shoot the messenger, this bug needs to be addressed, not get papered over. > >Can we patch suspend-utils as follows? > > Perhaps we can. Let's ask the new maintainer. > > Rodolfo, do you think you can apply the patch below to suspend-utils? > > >Alternatively, suspend-utils > >could clear the dirty limits before it starts writing and restore them > >post-resume. > > That (and the patch too) doesn't seem to address the problem with existing > suspend-utils > binaries, however. It's userspace that freezes the system before issuing buffered IO, so my conclusion was that the bug is in there. This is arguable. I also wouldn't be opposed to a patch that sets the dirty limits to infinity from the ioctl that freezes the system or creates the image.