Ubuntu server should not suppress conosle outputs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
procps (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Hello,
Ubuntu server(12.04 LTS, 12.10, 13.04 and 13.10) currently uses the following /proc/sys/
ubuntu@
4 4 1 7
This configuration suppresses register dump and stacktrace outputs from lockup detector and khungtaskd on a console.
For example:
lockup detector:
[41812904.107136] BUG: soft lockup - CPU#1 stuck for 22s! [insmod:1003]
[41812904.107213] Stack:
[41812904.107230] Call Trace:
[41812904.107269] Code: 90 41 89 fe 65 44 8b 2c 25 10 da 00 00 66 66 90 0f ae e8 e8 f9 58 d0 ff 66 90 41 89 c4 eb 11 66 90 f3 90 65 8b 1c 25 10 da 00 00 <41> 39 dd 75 20 66 66 90 0f ae e8 e8 d6 58 d0 ff 66 90 89 c2 44
[41812908.227029] INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)
khungtaskd:
[41818362.469083] INFO: task swapon:1032 blocked for more than 120 seconds.
[41818362.469096] "echo 0 > /proc/sys/
[41818482.409149] INFO: task swapon:1032 blocked for more than 120 seconds.
[41818482.409161] "echo 0 > /proc/sys/
These lack the most important information for debugging lockup/hung-up issue and make debugging such kernel issue difficult.
On the other hand, other distros basically use the different configuration by default as shown below. I believe that Ubuntu server should use this instead of the currnet one.
[ec2-user@
7 4 1 7
This will not suppress such important console outputs and we can understand where a problem happens at a glance.
For example:
[41814106.447145] BUG: soft lockup - CPU#1 stuck for 23s! [insmod:1066]
[41814106.447160] Modules linked in: stallmod(O+) isofs acpiphp
[41814106.447174] CPU 1
[41814106.447178] Modules linked in: stallmod(O+) isofs acpiphp
[41814106.447191]
[41814106.447198] Pid: 1066, comm: insmod Tainted: G O 3.2.0-52-virtual #78-Ubuntu
[41814106.447209] RIP: e030:[<
[41814106.447225] RSP: e02b:ffff8801d0
[41814106.447230] RAX: 000000004c5a957c RBX: 0000000000000001 RCX: 000000004c5a954c
...snip...
[41814106.447296] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[41814106.447304] Process insmod (pid: 1066, threadinfo ffff8801d0b4c000, task ffff8801d09b96e0)
[41814106.447311] Stack:
[41814106.447315] ffff8801d0b4ded8 ffffffff8131693a 0000000000001bbd 0000000000000000
[41814106.447326] ffffffffa000c000 0000000001351010 ffff8801d0b4dee8 ffffffff8131686c
[41814106.447336] ffff8801d0b4df08 ffffffffa000702a 0000000000000000 ffffffffa0009020
[41814106.447347] Call Trace:
[41814106.447357] [<ffffffff81316
[41814106.447365] [<ffffffffa000c
[41814106.447371] [<ffffffff81316
[41814106.447380] [<ffffffffa0007
[41814106.447387] [<ffffffffa000c
[41814106.447397] [<ffffffff81002
...snip...
[41814106.447541] [<ffffffff81661
[41814112.951787] INFO: rcu_sched detected stall on CPU 1 (t=15000 jiffies)
[41814112.951801] sending NMI to all CPUs:
[41814123.412289] sched: RT throttling activated
On Ubuntu server, the default configuration is in /etc/sysctl.
ubuntu@
# the following stops low-level messages on console
kernel.printk = 4 4 1 7
ubuntu@
Here is a possible fix.
--- /etc/sysctl.
+++ /etc/sysctl.
@@ -1,3 +1,6 @@
-# the following stops low-level messages on console
-kernel.printk = 4 4 1 7
+# To stop low-level messages on console, use the following configuration instead.
+#
+# kernel.printk = 4 4 1 7
+#
+kernel.printk = 7 4 1 7
As far as I can see, this has been originally brought from Debian[1] but I don't think that the point of the original report is still true today. I believe that suppressing register dump and stacktrace on console by default is not good idea, especially on server-class OS.
Of course, this can be easily configured by end-user themselves, but most of them will never touch such configuration unless someone asks after a problem has happened.
Do you have any concerns about this? Any comments would be appreciated.