Cavium ThunderX lacks power settings after enlistment apparently due to missing kernel

Bug #1702976 reported by dann frazier
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned
maas-images
Fix Released
Undecided
Unassigned

Bug Description

This appears to be due to the 'i2c_thunderx' module being unavailable in the enlistment environment. The IPMI SSIF interface on these systems uses a bus driven by this driver.

Tags: kernel maas

Related branches

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Please provide either:

1. console logs
2. /var/log/maas/rsyslog/maas-enlisting-node/<date>/messages file section that includes the enlistment of this machine
3. What MAAS version?

Changed in maas:
status: New → Invalid
milestone: none → 2.3.0
status: Invalid → Incomplete
Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1702976] Re: Cavium ThunderX nodes fail to auto-enlist

On Fri, Jul 7, 2017 at 12:01 PM, Andres Rodriguez
<email address hidden> wrote:
> Please provide either:
>
> 1. console logs

Attached. You'll see that I enabled debugging to the enlistment
userdata to suss out the problem:

[ 85.226096] cloud-init[1777]: + modprobe i2c_thunderx
[ 85.231335] cloud-init[1777]: modprobe: FATAL: Module i2c_thunderx
not found in directory /lib/modules/4.4.0-83-generic

Normally, this driver would be autoloaded by udev. Later, when
ipmi_ssif loads, we *should* see:
[ 550.294806] IPMI SSIF Interface driver
[ 550.319453] ipmi_ssif: Trying SMBIOS-specified SSIF interface at i2c
address 0x12, adapter Cavium ThunderX i2c adapter at 0000:01:09.4,
slave address 0x0
[ 551.258872] ipmi_ssif 5-0012: Found new BMC (man_id: 0x000000,
prod_id: 0xaabb, dev_id: 0x20)

But, instead we get just:
[ 86.366886] IPMI SSIF Interface driver

> 2. /var/log/maas/rsyslog/maas-enlisting-node/<date>/messages file section that includes the enlistment of this machine
> 3. What MAAS version?

MAAS version: 2.2.0 (bzr6054-0ubuntu1~16.04.1)

Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: Cavium ThunderX nodes fail to auto-enlist

@Dann,

I see the machine enlisted successfully, but your bug says it hasn't ?

[ 87.017776] cloud-init[1777]: + cat /tmp/enlist.out
[ 87.018993] cloud-init[1777]: {
[ 87.019501] cloud-init[1777]: "status_message": null,
[ 87.020099] cloud-init[1777]: "architecture": "arm64/generic",
[ 87.020611] cloud-init[1777]: "system_id": "abqxqk",
[ 87.021054] cloud-init[1777]: "status_name": "New",
[ 87.021493] cloud-init[1777]: "resource_uri": "/MAAS/api/2.0/machines/abqxqk/",
[ 87.021932] cloud-init[1777]: "status": 0,
[ 87.022371] cloud-init[1777]: "status_action": null,
[ 87.022814] cloud-init[1777]: "domain": {
[ 87.023255] cloud-init[1777]: "resource_record_count": 7,
[ 87.023694] cloud-init[1777]: "name": "maas",
[ 87.024153] cloud-init[1777]: "id": 0,
[ 87.024601] cloud-init[1777]: "ttl": null,
[ 87.025042] cloud-init[1777]: "authoritative": true
[ 87.025484] cloud-init[1777]: },
[ 87.025923] cloud-init[1777]: "fqdn": "crack-horse.maas",
[ 87.026363] cloud-init[1777]: "zone": {
[ 87.026800] cloud-init[1777]: "description": "",
[ 87.027242] cloud-init[1777]: "name": "default",
[ 87.027680] cloud-init[1777]: "id": 1
[ 87.028140] cloud-init[1777]: },
[ 87.028588] cloud-init[1777]: "power_type": "",
[ 87.029032] cloud-init[1777]: "hostname": "crack-horse",
[ 87.029477] cloud-init[1777]: "power_state": "unknown",
[ 87.029918] cloud-init[1777]: "node_type": 0
[ 87.030380] cloud-init[1777]: }+ echo =============================================
[ 87.030821] cloud-init[1777]: =============================================
[ 87.031263] cloud-init[1777]: + sleep 10

Revision history for this message
Andres Rodriguez (andreserl) wrote :

That said, on the kernel loading the module, that seems to be a bug with the kernel, or udev rather and not MAAS right? Or maybe, the kernel doesn't even have the module available.

But based on what I look there:

1. your machine enlisted.
2. Your machine doesn't have power management, because it was unable to load the kernel module.
3. If it was unable to load the module, I'd normally say this is a bug with the kernel for not having the module, which that's what it appears.

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1702976] Re: Cavium ThunderX nodes fail to auto-enlist

On Fri, Jul 7, 2017 at 1:12 PM, Andres Rodriguez
<email address hidden> wrote:
> @Dann,
>
> I see the machine enlisted successfully, but your bug says it hasn't ?

Sorry, I should've said failed to discover the power configuration. It
did technically enlist, yes, but requires manual power control
configuration before it can commission.

On Fri, Jul 7, 2017 at 1:14 PM, Andres Rodriguez
<email address hidden> wrote:
> That said, on the kernel loading the module, that seems to be a bug with
> the kernel, or udev rather and not MAAS right?

The project that needs a fix for this is presumably whichever one
chooses the modules that go into the ephemeral images. I'm not sure
what that project is - MAAS is impacted by it, and is the closest
project to it, so I filed it here. If we want end-users to further
triage bugs seen with MAAS to specific underlying projects, I'd
suggest adding a triage guide to the report-a-bug page on launchpad.

> Or maybe, the kernel
> doesn't even have the module available.
>
> But based on what I look there:
>
> 1. your machine enlisted.
> 2. Your machine doesn't have power management, because it was unable to load the kernel module.
> 3. If it was unable to load the module, I'd normally say this is a bug with the kernel for not having the module, which that's what it appears.

The kernel build does have the module. If I deploy a xenial system
with the exact same kernel build enlistment is using, IPMI works
correctly. It just seems this module was omitted from the ephemerals.

  -dann

Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: Cavium ThunderX lacks power settings after enlistment

Dann,

Could that be because the driver you are talking about is in linux-image-extra* instead of the main kernel ?

summary: - Cavium ThunderX nodes fail to auto-enlist
+ Cavium ThunderX lacks power settings after enlistment
summary: - Cavium ThunderX lacks power settings after enlistment
+ Cavium ThunderX lacks power settings after enlistment apparently due to
+ missing kernel
tags: added: kernel maas
Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Dann,

So I did some testing and these are my findings.

MAAS uses the Ubuntu kernel as available in the archives. I run a ephemeral environment with such kernel and this is what I found:

ubuntu@good-emu:~$ uname -a
Linux good-emu 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:58:57 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux

ubuntu@good-emu:~$ modinfo i2c_thunderx
modinfo: ERROR: Module i2c_thunderx not found.

ubuntu@good-emu:~$ lsmod | grep thunder
thunder_bgx 24576 1 nicpf
mdio_thunder 16384 0
mdio_cavium 16384 1 mdio_thunder

So, by the looks of it, the module is not available in the main kernel. Could this be available in a HWE but not in the Xenial Kernel ?

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1702976] Re: Cavium ThunderX lacks power settings after enlistment apparently due to missing kernel

On Mon, Jul 10, 2017 at 8:59 PM, Andres Rodriguez
<email address hidden> wrote:
> @Dann,
>
> So I did some testing and these are my findings.
>
> MAAS uses the Ubuntu kernel as available in the archives. I run a
> ephemeral environment with such kernel and this is what I found:
>
> ubuntu@good-emu:~$ uname -a
> Linux good-emu 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:58:57 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux
>
> ubuntu@good-emu:~$ modinfo i2c_thunderx
> modinfo: ERROR: Module i2c_thunderx not found.
>
> ubuntu@good-emu:~$ lsmod | grep thunder
> thunder_bgx 24576 1 nicpf
> mdio_thunder 16384 0
> mdio_cavium 16384 1 mdio_thunder
>
>
> So, by the looks of it, the module is not available in the main kernel. Could this be available in a HWE but not in the Xenial Kernel ?

$ dpkg -c linux-image-4.4.0-83-generic_4.4.0-83.106_arm64.deb | grep thunderx
-rw-r--r-- root/root 31358 2017-06-26 15:35
./lib/modules/4.4.0-83-generic/kernel/drivers/i2c/busses/i2c-thunderx.ko

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1702976] Re: Cavium ThunderX lacks power settings after enlistment

On Mon, Jul 10, 2017 at 8:01 PM, Andres Rodriguez
<email address hidden> wrote:
> Dann,
>
> Could that be because the driver you are talking about is in linux-
> image-extra* instead of the main kernel ?

hey Andres,

  The linux-image-extra split didn't occur for arm64 until after xenial.

  -dann

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Dann,

I tested the freeipmi-tools from -proposed and this does not solve the issue. I did this:

root@ideal-filly:~# sudo bmc-config --checkout
Unable to get Number of Users
root@ideal-filly:~# find / -name *thunder*
/lib/modules/4.4.0-83-generic/kernel/drivers/net/phy/mdio-thunder.ko
/lib/modules/4.4.0-83-generic/kernel/drivers/net/ethernet/cavium/thunder
/lib/modules/4.4.0-83-generic/kernel/drivers/net/ethernet/cavium/thunder/thunder_bgx.ko
/media/root-ro/usr/share/doc/git/contrib/thunderbird-patch-inline
/sys/bus/pci/drivers/thunder-BGX
/sys/bus/pci/drivers/thunder-nic
/sys/bus/pci/drivers/thunder-nicvf
/sys/bus/pci/drivers/mdio_thunder
/sys/bus/event_source/devices/armv8_cavium_thunder
/sys/bus/platform/drivers/pci_thunder_pem
/sys/bus/platform/drivers/pci_thunder_ecam
/sys/devices/armv8_cavium_thunder
/sys/module/nicpf/drivers/pci:thunder-nic
/sys/module/nicvf/drivers/pci:thunder-nicvf
/sys/module/mdio_cavium/holders/mdio_thunder
/sys/module/thunder_bgx
/sys/module/thunder_bgx/drivers/pci:thunder-BGX
/sys/module/mdio_thunder
/sys/module/mdio_thunder/drivers/pci:mdio_thunder
/usr/share/doc/git/contrib/thunderbird-patch-inline

If you can see, there's no 'i2c-thunderx' driver.

This seems like the driver is missing in the boot kernel. We should raise this with the kernel team.

Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1702976] Re: Cavium ThunderX lacks power settings after enlistment apparently due to missing kernel

On Tue, Jul 11, 2017 at 9:03 PM, Andres Rodriguez
<email address hidden> wrote:
> @Dann,
>
> I tested the freeipmi-tools from -proposed and this does not solve the
> issue. I did this:

Thanks for checking - but I think we've conflated two issues here.
The issue freeipmi-tools fixes is unrelated to this bug. When I
mentioned that on IRC, it was a request to see if we could get that
into MAAS CI for regression testing across other platforms. It fixes
an issue for us, and I've regression tested it on 1 x86 system, but
more platform exposure before it hits -updates wouldn't hurt.

> If you can see, there's no 'i2c-thunderx' driver.
>
> This seems like the driver is missing in the boot kernel. We should
> raise this with the kernel team.

I've shown that the kernel debs have this driver in Comment #8, and I
validated that the driver DTRT when I verified the kernel SRU that
added it, so I'm not sure what more to ask of the kernel team. Again,
my theory is that modules are being stripped out during the ephemeral
image build, so I'd suggest forwarding on to whatever team generates
those these days.

  -dann

Lee Trager (ltrager)
Changed in maas-images:
status: New → Fix Released
Changed in maas:
status: Incomplete → Invalid
Changed in maas:
milestone: 2.3.0 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.