In Ubuntu16.10:Fadump fails as Kernel panic reported while dumping-,console got hung on 32TB Brazos System (kdump)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Triaged
|
High
|
Canonical Kernel Team |
Bug Description
== Comment: #0 - Praveen K. Pandey <email address hidden> - 2016-07-17 02:37:31 ==
Hi
In Ubuntu16.10 I I tried fadump in Brazos system (32TB Memory and 192 core) , when trigger panic in kernel panic occur and console got hung.
Reproducible Step:
1- Install Ubuntu16.10
2- boot system with 31TB and 192 Core
3- configure fadump in system
4- verify fadump in system that it is running
5- Trigger panic in system
Actual Result
Not able to take Fadump , kernel panic and console got hung
Expected Result
Fadump will be captured
Log:
root@ltc-brazos1:~# kdump-config show
DUMP_MODE: fadump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.
KDUMP_COREDIR: /var/crash
/var/
kdump initrd:
/var/
current state: ready to fadump
root@ltc-brazos1:~#
root@ltc-brazos1:~# cat /proc/cmdline
BOOT_IMAGE=
root@ltc-brazos1:~#
ltc-brazos1 login: [ 442.749993] sysrq: SysRq : Trigger a crash
[ 442.750031] Unable to handle kernel paging request for data at address 0x00000000
[ 442.750037] Faulting instruction address: 0xc000000000670014
[ 442.750043] Oops: Kernel access of bad area, sig: 11 [#1]
[ 442.750047] SMP NR_CPUS=2048 NUMA pSeries
[ 442.750053] Modules linked in: pseries_rng btrfs xor raid6_pq rtc_generic sunrpc autofs4 ses enclosure ipr
[ 442.750068] CPU: 157 PID: 403890 Comm: bash Not tainted 4.4.0-30-generic #49-Ubuntu
[ 442.750074] task: c00003f97b0af640 ti: c00003f97b104000 task.ti: c00003f97b104000
[ 442.750079] NIP: c000000000670014 LR: c0000000006710c8 CTR: c00000000066ffe0
[ 442.750083] REGS: c00003f97b107990 TRAP: 0300 Not tainted (4.4.0-30-generic)
[ 442.750088] MSR: 8000000000009033 <SF,EE,
[ 442.750100] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c0000000006710c8 c00003f97b107c10 c0000000015b5d00 0000000000000063
GPR04: c00001faba749c50 c00001faba75b4e0 c0001f3efe7c0000 0000000000000313
GPR08: 0000000000000007 0000000000000001 0000000000000000 c0001f3efe7cecb8
GPR12: c00000000066ffe0 c00000000bc9d380 ffffffffffffffff 0000000022000000
GPR16: 0000000010170dc8 000001001ef401d8 0000000010140f58 00000000100c7570
GPR20: 0000000000000000 000000001017dd58 0000000010153618 000000001017b608
GPR24: 00003ffff7c9e7b4 0000000000000001 c0000000014f8e58 0000000000000004
GPR28: c0000000014f9218 0000000000000063 c0000000014b11dc 0000000000000000
[ 442.750165] NIP [c000000000670014] sysrq_handle_
[ 442.750170] LR [c0000000006710c8] __handle_
[ 442.750174] Call Trace:
[ 442.750179] [c00003f97b107c10] [c000000000e08f28] _fw_tigon_
[ 442.750186] [c00003f97b107c30] [c0000000006710c8] __handle_
[ 442.750192] [c00003f97b107cd0] [c000000000671868] write_sysrq_
[ 442.750199] [c00003f97b107d00] [c00000000037ae30] proc_reg_
[ 442.750205] [c00003f97b107d50] [c0000000002e186c] __vfs_write+
[ 442.750210] [c00003f97b107d90] [c0000000002e25a0] vfs_write+
[ 442.750216] [c00003f97b107de0] [c0000000002e35dc] SyS_write+
[ 442.750222] [c00003f97b107e30] [c000000000009204] system_
[ 442.750226] Instruction dump:
[ 442.750229] 38425d20 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d220019 394931e4
[ 442.750238] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6
[ 442.750248] ---[ end trace ff61e1bc4dd59a42 ]---
[ 442.752585]
Loading Linux 4.4.0-30-generic ...
Loading initial ramdisk ...
OF stdout device is: /vdevice/
Preparing to boot Linux version 4.4.0-30-generic (buildd@
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-
command line: BOOT_IMAGE=
Ignoring mem=00000001000
memory layout at init:
memory_limit : 0000000000000000 (16 MB aligned)
alloc_bottom : 000000000e020000
alloc_top : 0000000010000000
alloc_top_hi : 0000000010000000
rmo_top : 0000000010000000
ram_top : 0000000010000000
instantiating rtas at 0x000000000e9e0
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000e030000 -> 0x000000000e0319a4
Device tree struct 0x000000000e040000 -> 0x000000000e640000
Quiescing Open Firmware ...
Booting Linux via __start() ...
-> smp_release_cpus()
spinning_
<- smp_release_cpus()
<- setup_system()
[ 0.000000] Kernel panic - not syncing: memblock_
[ 0.000000]
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.0-30-generic #49-Ubuntu
[ 0.000000] Call Trace:
[ 0.000000] [c0000000015b39d0] [c000000000af955c] dump_stack+
[ 0.000000] [c0000000015b3a10] [c000000000af5790] panic+0x100/0x2c0
[ 0.000000] [c0000000015b3aa0] [c000000000ed238c] memblock_
[ 0.000000] [c0000000015b3b30] [c0000000002db69c] __earlyonly_
[ 0.000000] [c0000000015b3b70] [c000000000afc5fc] vmemmap_
[ 0.000000] [c0000000015b3c40] [c000000000afdfa8] sparse_
[ 0.000000] [c0000000015b3c70] [c000000000ed4234] sparse_
[ 0.000000] [c0000000015b3d30] [c000000000eb3604] initmem_
[ 0.000000] [c0000000015b3e50] [c000000000eab418] setup_arch+
[ 0.000000] [c0000000015b3f00] [c000000000ea3ae4] start_kernel+
[ 0.000000] [c0000000015b3f90] [c000000000008c6c] start_here_
[ 0.000000] ---[ end Kernel panic - not syncing: memblock_
[ 0.000000]
Regards
Praveen
== Comment: #1 - Praveen K. Pandey <email address hidden> - 2016-07-17 02:40:23 ==
== Comment: #14 - SRIKAR DRONAMRAJU <email address hidden> - 2016-08-31 11:02:28 ==
V3 was posted upstream at http://<email address hidden>.
That should atleast solve the problem (atleast it wouldnt panic/hang on triggering fadump)
The patches posted were on top of 4.8-rc3 and apply cleanly on v4.4
I am not sure what is the kernel targeted for 16.10. I hear its going to be based on v4.8
Once we know which kernel version ubuntu is targeting we can backport the patchset accordingly.
== Comment: #18 - Gary M. Gaydos <email address hidden> - 2016-09-14 16:56:11 ==
Hi Canonical: Per this comment with patch set link, this bug appears to be fixed using the 4.40-34 kernel. Of course the 16.10 release will use a newer kernel.
V3 was posted upstream at http://<email address hidden>.
That should atleast solve the problem (atleast it wouldnt panic/hang on triggering fadump)
The patches posted were on top of 4.8-rc3 and apply cleanly on v4.4
I am not sure what is the kernel targeted for 16.10. I hear its going to be based on v4.8
Once we know which kernel version ubuntu is targeting we can backport the patchset accordingly.
Exposing a comment from test that was previously private:
(In reply to comment #16)
> Hi Praveen,
>
> I have applied the patches to the Yakkety kernel source and built the *.deb
> files. I have kept them on powerdev.
> details over email
Hi latha ,
Thanks i tried with patched kernel and seems me issue is fixed . able to capture FAdump .
Log:
root@ltc-brazos1:~# cat /proc/cmdline
BOOT_IMAGE=
root@ltc-brazos1:~#
root@ltc-
201609140950 kexec_cmd linux-image-
root@ltc-
root@ltc-
dmesg.201609140950 dump.201609140950
root@ltc-
Regards
Praveen
== Comment: #20 - Hari Krishna Bathini <email address hidden> - 2016-09-23 03:49:36 ==
Mirror the bug so Canonical can pick the fix patches.
Srikar, can you please provide the upstream commit ids of the fix patches..
Thanks
Hari
== Comment: #21 - Hari Krishna Bathini <email address hidden> - 2016-09-23 03:59:17 ==
(In reply to comment #14)
> V3 was posted upstream at
> http://<email address hidden>.
> ibm.com.
>
> That should atleast solve the problem (atleast it wouldnt panic/hang on
> triggering fadump)
>
> The patches posted were on top of 4.8-rc3 and apply cleanly on v4.4
> I am not sure what is the kernel targeted for 16.10. I hear its going to be
> based on v4.8
Yeah. 16.10 -proposed now has v4.8 based kernel..
Thanks
Hari
Changed in linux (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
status: | New → Triaged |
Default Comment by Bridge