We create and deploy custom AMIs based on Ubuntu Jammy and we noticed since jammy-20230428 that randomly all the AMI based on it sometimes fail during the boot process. I can destroy and deploy again to get rid of this. The stack trace is always the same:
I cannot debug more because of lacking connection, I could copy the trace above thanks to the virtual serial console. If there are some boot options I could add in order to gives more information I will happy to do that.
We create and deploy custom AMIs based on Ubuntu Jammy and we noticed since jammy-20230428 that randomly all the AMI based on it sometimes fail during the boot process. I can destroy and deploy again to get rid of this. The stack trace is always the same:
``` kernel/ hung_task_ timeout_ secs" disables this message. 0x254/0x5a0 0x46/0x80 get_tag+ 0x117/0x300 sched_domains_ rcu+0x40/ 0x40 alloc_requests+ 0xc4/0x1e0 get_new_ requests+ 0xcc/0x190 submit_ bio+0x1eb/ 0x450 bio+0xf6/ 0x190 bio_noacct_ nocheck+ 0xc2/0x120 bio_noacct+ 0x209/0x560 bio+0x40/ 0xf0 bh_wbc+ 0x134/0x170 0xbc/0xd0 isra.0+ 0x126/0x1e0 pass+0xbb/ 0xb90 create_ tfm_node+ 0x9a/0x120 recover+ 0x8d/0x150 load+0x130/ 0x1f0 journal+ 0x271/0x5d0 fill_super+ 0x2aa1/ 0x2e10 super+0xd3/ 0x280 super+0xd3/ 0x280 bdev+0x189/ 0x280 fill_super+ 0x2e10/ 0x2e10 tree+0x15/ 0x20 tree+0x2a/ 0xd0 mount+0x184/ 0x2e0 0x1f3/0x890 0x5e/0x9f root+0x8d/ 0x124 root+0xd8/ 0x1ea 0x62/0x6e namespace+ 0x13f/0x19e init_freeable+ 0x120/0x139 init+0x1b/ 0x170 fork+0x22/ 0x30
[ 849.765218] INFO: task swapper/0:1 blocked for more than 727 seconds.
[ 849.774999] Not tainted 5.19.0-1025-aws #26~22.04.1-Ubuntu
[ 849.787081] "echo 0 > /proc/sys/
[ 849.811223] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000
[ 849.883494] Call Trace:
[ 849.891369] <TASK>
[ 849.899306] __schedule+
[ 849.907878] schedule+0x5d/0x100
[ 849.917136] io_schedule+
[ 849.970890] blk_mq_
[ 849.976136] ? destroy_
[ 849.981442] __blk_mq_
[ 849.986750] blk_mq_
[ 849.992185] blk_mq_
[ 850.070689] __submit_
[ 850.075545] submit_
[ 850.080841] submit_
[ 850.085654] submit_
[ 850.090361] submit_
[ 850.094905] ll_rw_block+
[ 850.175198] do_readahead.
[ 850.183531] jread+0xeb/0x100
[ 850.189648] do_one_
[ 850.193917] ? crypto_
[ 850.207511] ? crc_43+0x1e/0x1e
[ 850.211887] jbd2_journal_
[ 850.272927] jbd2_journal_
[ 850.280601] ext4_load_
[ 850.288540] __ext4_
[ 850.296290] ? pointer+0x36f/0x500
[ 850.304910] ext4_fill_
[ 850.372470] ? ext4_fill_
[ 850.380637] get_tree_
[ 850.384398] ? __ext4_
[ 850.388490] ext4_get_
[ 850.392123] vfs_get_
[ 850.395859] do_new_
[ 850.468151] path_mount+
[ 850.471804] ? putname+0x5f/0x80
[ 850.475341] init_mount+
[ 850.478976] do_mount_
[ 850.482626] mount_block_
[ 850.486368] mount_root+
[ 850.568079] prepare_
[ 850.571984] kernel_
[ 850.575930] ? rest_init+0xe0/0xe0
[ 850.579511] kernel_
[ 850.583084] ? rest_init+0xe0/0xe0
[ 850.586642] ret_from_
[ 850.668205] </TASK>
```
This happens since 5.19.0-1024-aws, I have now rolled back to 5.19.0-1022-aws.
I cannot debug more because of lacking connection, I could copy the trace above thanks to the virtual serial console. If there are some boot options I could add in order to gives more information I will happy to do that.