830 TI on Tuleta during IPL of Linux - bad xisr passed to PHYP

Bug #1499357 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Tim Gardner
Vivid
Fix Released
Undecided
Tim Gardner
Wily
Fix Released
High
Tim Gardner

Bug Description

I looked at the dump and the assert is due to a bad xisr. From the VIO trace the xisr was 0000000001000A00.

> ~d4a3/phypmacro/vio -globals -fr
+-----------------------------------------+
| HvVioGlobals (address=8000000807621100) |
+-----------------------------------------+
 BitBucket: 0x07F000001A2D2000
 AssertFr: 0x07F000080F394C80
 AssertEnabled: True
 VlanMap: 0x07F0000807621000
 +------------------------------------+
 | HvVioFr (address=07F000001A282800) |
 +------------------------------------+
  [ 0] HvVioInterruptAssertBadXisr [TB] 0000002C2067AB1A 0000000001000A00 0000000000000000 0000000000000000 0000000000000000

Here is the trace along with some Linux output that followed:

Token = 34, timebase = 0x24222848
h_hypervisor_esw_call(0x504c) rc = 0xfffffffc (-4)
175: b=
    0100 0A00 0000 0000 0000 0000 0000 0001 [................]
105: get_parms_ptr=
    0100 0A00 0000 0000 0000 0000 0000 0001 [................]
GET XIVE ERROR hcall rc=fffffffc buff_rc=1

[ 0.000517] irq: (null) didn't like hwirq-0x1000a00 to VIRQ16 mapping (rc=-22)
[ 0.000578] hvsi_console_init: couldn't create irq mapping for 0x1000a00

---------------------------------

I then dumped the device tree for interrupts that PFW communicates to Linux
via the device as follows:

1) Here are all the 'interrupt-ranges' properties found:
0 > showprops -i interrupt-ranges
/ibm,platform-facilities 00090400 00000400
/event-sources 00090000 00000008
/interrupt-controller@800000025000010 000037f8 00000004
/interrupt-controller@800000025000013 00003ff8 00000004
/interrupt-controller@800000025000014
/interrupt-controller@800000025000015
/interrupt-controller@800000025000018 000017f8 00000004
/interrupt-controller@80000002500001b
/interrupt-controller@80000002500001d 00001ff8 00000004
/interrupt-controller@80000002500001e
/interrupt-controller@80000002500001f
/interrupt-controller@800000025000021 00000ff8 00000004
/interrupt-controller@800000025000028 00002ff8 00000004
/interrupt-controller@800000025000029 000027f8 00000004
/vdevice 000a0000 000000c7 000b0000 0000007f

2) Here are all the 'ibm,msi-ranges' properties found:
0 > showprops -i ibm,msi-ranges
/pci@800000020000014/ethernet@0 00003be0 00000001
/pci@800000020000014/ethernet@0,1 00003be1 00000001
/pci@800000020000014/ethernet@0,2 00003be2 00000001
/pci@800000020000014/ethernet@0,3 00003be3 00000001
/pci@800000020000015/pci1014,034A@0 00003820 00000001
/pci@800000020000018/pci@0/pci@2/fibre-channel@0 00001000 00000001
/pci@800000020000018/pci@0/pci@2/fibre-channel@0,1 00001001 00000001
/pci@800000020000018/pci@0/pci@3/fibre-channel@0 00001002 00000001
/pci@800000020000018/pci@0/pci@3/fibre-channel@0,1 00001003 00000001
/pci@80000002000001b/usb@0 00001fa0 00000001
/pci@80000002000001e/ethernet@0 00001ce0 00000001
/pci@80000002000001e/ethernet@0,1 00001ce1 00000001
/pci@80000002000001e/ethernet@0,2 00001ce2 00000001
/pci@80000002000001e/ethernet@0,3 00001ce3 00000001
/pci@800000020000029/pci@0/pci@2/fibre-channel@0 00002000 00000001
/pci@800000020000029/pci@0/pci@2/fibre-channel@0,1 00002001 00000001
/pci@800000020000029/pci@0/pci@3/fibre-channel@0 00002002 00000001
/pci@800000020000029/pci@0/pci@3/fibre-channel@0,1 00002003 00000001

3) Here are all the 'interrupts' properties found:
0 > showprops -i interrupts
/event-sources/epow-events 00090001 00000000
/vdevice/vty@30000000 000a0000 00000000
/vdevice/vty@30000001 000a0001 00000000
/vdevice/ibm,vmc@30000002 000a0002 00000000

----------------------------------

PFW did not provide interrupt 01000A00 to the OS, so I don't think either PFW or
PHYP (who provides PFW with the int values) is at fault here. This needs to go
to Linux to determine where the 01000A00 comes from.

My guess is the interrupt 000A0001 provided for virtual console device /vdevice/vty@30000001
may be the source of the issue. Perhaps Linux is passing RTAS the little endian version of
the /vdevice/vty@30000001 interrupt since BE value 01000A00 is LE value 000A0001.

I think there is an endianess issue in hvsi_console_init where irq, as well as vtermno, are not byte swapped when fetched from the DT.

However, I tried to get it fixed on my LPAR but I can't reach that code since there is no such a device configured.
How could I get this device (serial hvterm-protocol) set up ?

I confirm that hvsi_console_init() assumes big endian, which is wrong. That explains the swapped irq value.

This patch is fixing all the endianness issues I found by reading the HVSI driver's code.

When booting the system, there is no more error messages displayed and the tty driver sounds configured correctly. However, I can't tell that the driver is fully functional since I don't know how to access the other side of the configured TTY.

The patch has been accpeted upstream in the powerpc/next branch:
https://git.kernel.org/powerpc/c/480798044eb268a31f6b

Hi,

This patch should be applied to Ubuntu 15.04 and 15.10.

Thanks,
Laurent.

Revision history for this message
bugproxy (bugproxy) wrote : patch on Ubuntu-vivid latest kernel

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-121364 severity-high targetmilestone-inin1504
Luciano Chavez (lnx1138)
affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)
Changed in linux (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High
status: New → Triaged
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Wily):
assignee: Canonical Kernel Team (canonical-kernel-team) → Tim Gardner (timg-tpi)
status: Triaged → Fix Committed
Changed in linux (Ubuntu Vivid):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Brad Figg (brad-figg)
Changed in linux (Ubuntu Vivid):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.2.0-12.14

---------------
linux (4.2.0-12.14) wily; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1499712

  [ Ben Pope ]

  * SAUCE: drivers/net/ethernet/atheros/alx: Add Killer E2400 device ID
    - LP: #1498633

  [ Knuth Posern ]

  * SAUCE: thunderbolt: Allow loading of module on recent Apple MacBooks
    with thunderbolt 2 controller
    - LP: #1497321

  [ Laurent Dufour ]

  * SAUCE: powerpc/hvsi: Fix endianness issues in the HVSI driver
    - LP: #1499357

  [ Upstream Kernel Changes ]

  * x86/hyperv: Mark the Hyper-V TSC as unstable
    - LP: #1498206
  * intel_pstate: fix PCT_TO_HWP macro
    - LP: #1499040
  * perf/x86/intel/rapl: Add support for Knights Landing (KNL)
    - LP: #1461370
  * drm/i915: Add audio pin sense / ELD callback
    - LP: #1398277
  * drm/i915: Call audio pin/ELD notify function
    - LP: #1398277
  * ALSA: hda - allow codecs to access the i915 pin/ELD callback
    - LP: #1398277
  * ALSA: hda - Wake the codec up on pin/ELD notify events
    - LP: #1398277
  * drm/i915: Add locks around audio component bind/unbind
    - LP: #1398277
  * drm/i915: Drop port_mst_index parameter from pin/eld callback
    - LP: #1398277

 -- Tim Gardner <email address hidden> Thu, 24 Sep 2015 09:19:23 -0600

Changed in linux (Ubuntu Wily):
status: Fix Committed → Fix Released
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-vivid' to 'verification-done-vivid'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-vivid
Revision history for this message
Laurent Dufour (ldufour) wrote :

Please provide exact package release including that patch to be tested.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Laurent: Looks like 3.19.0-31.36

Laurent Dufour (ldufour)
tags: added: verification-done-vivid
removed: verification-needed-vivid
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (10.5 KiB)

This bug was fixed in the package linux - 3.19.0-31.36

---------------
linux (3.19.0-31.36) vivid; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1503703

  [ Andy Whitcroft ]

  * Revert "SAUCE: aufs3: mmap: Fix races in madvise_remove() and
    sys_msync()"
    - LP: #1503655

  [ Ben Hutchings ]

  * SAUCE: aufs3: mmap: Fix races in madvise_remove() and sys_msync()
    - LP: #1503655
    - CVE-2015-7312

linux (3.19.0-31.35) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1503005

  [ Ben Hutchings ]

  * SAUCE: aufs3: mmap: Fix races in madvise_remove() and sys_msync()
    - CVE-2015-7312

  [ Craig Magina ]

  * [Config] Add XGENE_EDAC, EDAC_SUPPORT and EDAC_ATOMIC_SCRUB
    - LP: #1494357

  [ John Johansen ]

  * SAUCE: (no-up) apparmor: fix mount not handling disconnected paths
    - LP: #1496430

  [ Laurent Dufour ]

  * SAUCE: powerpc/hvsi: Fix endianness issues in the HVSI driver
    - LP: #1499357

  [ Tim Gardner ]

  * [Config] CONFIG_RTC_DRV_XGENE=y for only arm64
    - LP: #1499869

  [ Upstream Kernel Changes ]

  * Revert "sit: Add gro callbacks to sit_offload"
    - LP: #1500493
  * ipmi/powernv: Fix minor locking bug
    - LP: #1493017
  * mmc: sdhci-pci: set the clear transfer mode register quirk for O2Micro
    - LP: #1472843
  * perf probe ppc: Fix symbol fixup issues due to ELF type
    - LP: #1485528
  * perf probe ppc: Use the right prefix when ignoring SyS symbols on ppc
    - LP: #1485528
  * perf probe ppc: Enable matching against dot symbols automatically
    - LP: #1485528
  * perf probe ppc64le: Fix ppc64 ABIv2 symbol decoding
    - LP: #1485528
  * perf probe ppc64le: Prefer symbol table lookup over DWARF
    - LP: #1485528
  * perf probe ppc64le: Fixup function entry if using kallsyms lookup
    - LP: #1485528
  * perf probe: Improve detection of file/function name in the probe
    pattern
    - LP: #1485528
  * perf probe: Ignore tail calls to probed functions
    - LP: #1485528
  * seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO
    - LP: #1496073
  * EDAC: Cleanup atomic_scrub mess
    - LP: #1494357
  * arm64: Enable EDAC on ARM64
    - LP: #1494357
  * MAINTAINERS: Add entry for APM X-Gene SoC EDAC driver
    - LP: #1494357
  * Documentation: Add documentation for the APM X-Gene SoC EDAC DTS
    binding
    - LP: #1494357
  * EDAC: Add APM X-Gene SoC EDAC driver
    - LP: #1494357
  * arm64: Add APM X-Gene SoC EDAC DTS entries
    - LP: #1494357
  * EDAC, edac_stub: Drop arch-specific include
    - LP: #1494357
  * NVMe: Fix blk-mq hot cpu notification
    - LP: #1498778
  * blk-mq: Shared tag enhancements
    - LP: #1498778
  * blk-mq: avoid access hctx->tags->cpumask before allocation
    - LP: #1498778
  * x86/ldt: Make modify_ldt synchronous
    - LP: #1500493
  * x86/ldt: Correct LDT access in single stepping logic
    - LP: #1500493
  * x86/ldt: Correct FPU emulation access to LDT
    - LP: #1500493
  * md: flush ->event_work before stopping array.
    - LP: #1500493
  * ipv6: addrconf: validate new MTU before applying it
    - LP: #1500493
  * virtio-net: drop NETIF_F_FRAGLIST
    - LP: #1500493
  * RDS: verify the underlying transport exists bef...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.