We're seeing the following stack traces on different production machines that are running Natty 2.6.38-8-server. The machines need to be rebooted to recover. http://oss.sgi.com/archives/xfs/2011-11/msg00401.html claims that this bug is fixed in 3.0. Can this patch be backported to a Natty kernel? Upgrading to Oneiric is not an option at the moment.
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299177] INFO: task xfssyncd/dm-4:739 blocked for more than 120 seconds.
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299235] xfssyncd/dm-4 D 000000000000000e 0 739 2 0x00000000
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299241] ffff880bdd211d00 0000000000000046 ffff880bdd211fd8 ffff880bdd210000
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299245] 0000000000013d00 ffff880bddd8df38 ffff880bdd211fd8 0000000000013d00
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299250] ffff881745c496e0 ffff880bddd8db80 0000000000000282 ffff8817de4e2800
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299254] Call Trace:
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299297] [<ffffffffa010b2d8>] xlog_grant_log_space+0x4a8/0x500 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299304] [<ffffffff8105f6f0>] ? default_wake_function+0x0/0x20
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299328] [<ffffffffa010d1ff>] xfs_log_reserve+0xff/0x140 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299352] [<ffffffffa01191fc>] xfs_trans_reserve+0x9c/0x200 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299373] [<ffffffffa00fd383>] xfs_fs_log_dummy+0x43/0x90 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299397] [<ffffffffa01303c1>] xfs_sync_worker+0x81/0x90 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299421] [<ffffffffa012f0f3>] xfssyncd+0x183/0x230 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299444] [<ffffffffa012ef70>] ? xfssyncd+0x0/0x230 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299450] [<ffffffff810871f6>] kthread+0x96/0xa0
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299456] [<ffffffff8100cde4>] kernel_thread_helper+0x4/0x10
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299460] [<ffffffff81087160>] ? kthread+0x0/0xa0
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299463] [<ffffffff8100cde0>] ? kernel_thread_helper+0x0/0x10
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299562] INFO: task mysqld:16377 blocked for more than 120 seconds.
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299586] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299615] mysqld D 000000000000000e 0 16377 1 0x00000000
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299619] ffff88146e6ffaa8 0000000000000082 ffff88146e6fffd8 ffff88146e6fe000
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299623] 0000000000013d00 ffff8817375303b8 ffff88146e6fffd8 0000000000013d00
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299628] ffff880bdf352dc0 ffff881737530000 0000000000000286 ffff8817de4e2800
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299632] Call Trace:
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299655] [<ffffffffa010b2d8>] xlog_grant_log_space+0x4a8/0x500 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299659] [<ffffffff8105f6f0>] ? default_wake_function+0x0/0x20
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299682] [<ffffffffa010d1ff>] xfs_log_reserve+0xff/0x140 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299705] [<ffffffffa01191fc>] xfs_trans_reserve+0x9c/0x200 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299729] [<ffffffffa0119071>] ? xfs_trans_alloc+0xa1/0xb0 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299752] [<ffffffffa011ef4f>] xfs_create+0x17f/0x660 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299776] [<ffffffffa012c07a>] xfs_vn_mknod+0xaa/0x1c0 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299799] [<ffffffffa012c1c0>] xfs_vn_create+0x10/0x20 [xfs]
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299804] [<ffffffff811705c1>] vfs_create+0xb1/0x110
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299809] [<ffffffff81173bd6>] do_last+0x346/0x410
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299812] [<ffffffff81174032>] do_filp_open+0x392/0x7c0
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299816] [<ffffffff81172d02>] ? user_path_at+0x62/0xa0
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299822] [<ffffffff811810f7>] ? alloc_fd+0xf7/0x150
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299826] [<ffffffff8116474a>] do_sys_open+0x6a/0x150
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299830] [<ffffffff81164850>] sys_open+0x20/0x30
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299833] [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299866] INFO: task cron:15652 blocked for more than 120 seconds.
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299918] cron D 0000000000000013 0 15652 1053 0x00000000
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299922] ffff88096bac7cb8 0000000000000082 ffff88096bac7fd8 ffff88096bac6000
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299926] 0000000000013d00 ffff880b6e555f38 ffff88096bac7fd8 0000000000013d00
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299930] ffff880bdf3eadc0 ffff880b6e555b80 ffff88096bac7cf8 ffff8817ddaf75b8
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299935] Call Trace:
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299941] [<ffffffff815d6537>] __mutex_lock_slowpath+0xf7/0x180
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299946] [<ffffffff812797d0>] ? security_inode_exec_permission+0x30/0x40
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299950] [<ffffffff815d5f23>] mutex_lock+0x23/0x50
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299954] [<ffffffff811739a8>] do_last+0x118/0x410
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299957] [<ffffffff81174032>] do_filp_open+0x392/0x7c0
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299962] [<ffffffff8113135d>] ? handle_mm_fault+0x16d/0x250
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299967] [<ffffffff811810f7>] ? alloc_fd+0xf7/0x150
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299971] [<ffffffff8116474a>] do_sys_open+0x6a/0x150
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299974] [<ffffffff81164850>] sys_open+0x20/0x30
Apr 10 23:31:14 nv-aw2az1-database0001 kernel: [2693078.299977] [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 979498
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.