linux-aws-5.19 hibernation tasks sometimes fail to freeze
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-aws (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Hibernation on AWS instances with jammy/5.
Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.247854] PM: hibernation: hibernation entry
Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.347353] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Feb 1 01:09:05 ip-172-31-54-178 kernel: [ 443.347355] sched_clock: Marking unstable (442909362062, 1007864825)
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.940489] Filesystems sync: 0.022 seconds
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.940492] Freezing user space processes ... (elapsed 0.001 seconds) done.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.941611] OOM killer disabled.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943036] PM: hibernation: Marking nosave pages: [mem 0x00000000-
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943039] PM: hibernation: Marking nosave pages: [mem 0x0009f000-
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943041] PM: hibernation: Marking nosave pages: [mem 0xbffe8000-
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943950] PM: hibernation: Basic memory bitmaps created
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 443.943961] PM: hibernation: Preallocating image memory
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782421] PM: hibernation: Allocated 9655951 pages for snapshot
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782424] PM: hibernation: Allocated 38623804 kbytes in 186.83 seconds (206.73 MB/s)
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 630.782426] Freezing remaining freezable tasks ...
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.789826] Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0):
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792830] task:kswapd0 state:D stack: 0 pid: 328 ppid: 2 flags:0x00004000
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792833] Call Trace:
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792835] <TASK>
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792837] __schedule+
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792842] schedule+0x58/0x100
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792844] io_schedule+
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792846] blk_mq_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792852] ? destroy_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792857] __blk_mq_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792859] blk_mq_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792861] blk_mq_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792864] __submit_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792866] submit_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792869] submit_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792871] ? sio_write_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792875] submit_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792877] __swap_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792879] swap_writepage+
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792880] pageout+0xe2/0x2f0
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792883] shrink_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792885] shrink_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792886] shrink_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792888] shrink_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792890] shrink_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792891] ? __schedule+
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792893] balance_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792894] ? zone_watermark_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792899] ? balance_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792900] kswapd+0x10c/0x1c0
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792901] ? balance_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792903] kthread+0xd1/0xf0
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792906] ? kthread_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792909] ret_from_
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792913] </TASK>
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792921]
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 650.792922] Restarting kernel threads ... done.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516499] PM: hibernation: Basic memory bitmaps freed
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516502] OOM killer enabled.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516502] Restarting tasks ... done.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.516740] sched_clock: Marking stable (650508475881, 1007864825)
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.626777] PM: hibernation: hibernation exit
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.670368] systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.672610] systemd[1]: snapd.service: Killing process 986 (snapd) with signal SIGABRT.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.719887] systemd[1]: snapd.service: Main process exited, code=exited, status=
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.719895] systemd[1]: snapd.service: Failed with result 'watchdog'.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.720923] systemd[1]: snapd.service: Consumed 1.714s CPU time.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.796678] systemd[1]: Starting /usr/lib/
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.797487] systemd[1]: systemd-
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.797650] systemd[1]: systemd-
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.798075] systemd[1]: Failed to start Hibernate.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800047] systemd[1]: Dependency failed for System Hibernation.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800082] systemd[1]: hibernate.target: Job hibernate.
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.800130] systemd[1]: systemd-
Feb 1 01:12:33 ip-172-31-54-178 kernel: [ 651.806905] systemd[1]: Stopped target Sleep.
Hibernation testing was performed across 93 instance types with 10 runs each. Each run consists of two hibernation and resume cycles while running a memory allocator. This issue was seen in about 25 of those 930 runs. It was observed on c5.12xlarge, c5d.12xlarge, m5a.large, m5a.xlarge, m5a.2xlarge, m5ad.xlarge, m5ad.2xlarge, r5a.xlarge, r5a.2xlarge, t3a.medium, t3a.xlarge, and t3a.2xlarge.
Here is the full syslog from which the portion in the bug description was extracted from.