MAAS deploys fail if host has NIC w/ random MAC
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Triaged
|
Medium
|
Unassigned | ||
3.3 |
Triaged
|
Medium
|
Unassigned | ||
cloud-init |
Expired
|
Undecided
|
Unassigned | ||
curtin |
New
|
Undecided
|
Unassigned |
Bug Description
The Nvidia DGX A100 server includes a USB Redfish Host Interface NIC. This NIC apparently provides no MAC address of it's own, so the driver generates a random MAC for it:
./drivers/
static int usbnet_
{
int status = usbnet_
if (!status && (dev->net-
return status;
}
This causes a problem with MAAS because, during deployment, MAAS sees this as a normal NIC and records the MAC. The post-install reboot then fails:
[ 43.652573] cloud-init[3761]: init.apply_
[ 43.700516] cloud-init[3761]: File "/usr/lib/
[ 43.724496] cloud-init[3761]: self.distro.
[ 43.740509] cloud-init[3761]: File "/usr/lib/
[ 43.764523] cloud-init[3761]: raise RuntimeError(msg)
[ 43.780511] cloud-init[3761]: RuntimeError: Not all expected physical devices present: {'fe:b8:
I'm not sure what the best answer for MAAS is here, but here's some thoughts:
1) Ignore all Redfish system interfaces. These are a connect between the host and the BMC, so they don't really have a use-case in the MAAS model AFAICT. These devices can be identified using the SMBIOS as described in the Redfish Host Interface Specification, section 8:
https:/
Which can be read from within Linux using dmidecode.
2) Ignore (or specially handle) all NICs with randomly generated MAC addresses. While this is the only time I've seen the random MAC with production server hardware, it is something I've seen on e.g. ARM development boards. Problem is, I don't know how to detect a generated MAC. I'd hoped the permanent MAC (ethtool -P) MAC would be NULL, but it seems to also be set to the generated MAC :(
fyi, 2 workarounds for this that seem to work:
1) Delete the NIC from the MAAS model in the MAAS UI after every commissioning.
2) Use a tag's kernel_opts field to modprobe.blacklist the driver used for the Redfish NIC.
Related branches
- dann frazier (community): Disapprove
- Adam Collard (community): Needs Fixing
- MAAS Lander: Needs Fixing
-
Diff: 17 lines (+6/-0)1 file modifiedsrc/metadataserver/builtin_scripts/network.py (+6/-0)
Changed in maas: | |
importance: | Undecided → High |
assignee: | nobody → Björn Tillenius (bjornt) |
Changed in maas: | |
assignee: | Björn Tillenius (bjornt) → nobody |
milestone: | none → 3.2.0 |
Changed in maas: | |
assignee: | nobody → Alberto Donato (ack) |
Changed in maas: | |
milestone: | 3.3.0 → 3.4.0 |
Changed in maas: | |
milestone: | 3.4.0 → 3.4.x |
It seems Ampere Mt. Jade Platform is also impacted by this issue.
Its dmidecode info (seems not provide useful information very much, unfortunately):
ubuntu@howzit:~$ sudo dmidecode -t bios -t 42; sudo dmidecode -t 42 -u
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.
# SMBIOS implementations newer than version 3.2.0 are not
# fully supported by this version of dmidecode.
Handle 0x0000, DMI type 0, 26 bytes
Characteristic s:
BIOS is upgradeable
Boot from CD is supported
Selectable boot is supported
ACPI is supported
UEFI is supported
BIOS Information
Vendor: Ampere(R)
Version: 1.6.20210526 (SCP: 1.06.20210526)
Release Date: 2021/05/26
ROM Size: 7680 kB
PCI is supported
BIOS Revision: 5.15
Firmware Revision: 1.6
Handle 0x0029, DMI type 13, 22 bytes
en|US| iso8859- 1
BIOS Language Information
Language Description Format: Long
Installable Languages: 1
Currently Installed Language: en|US|iso8859-1
Handle 0x0055, DMI type 42, 17 bytes
Management Controller Host Interface
Host Interface Type: OEM
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.
# SMBIOS implementations newer than version 3.2.0 are not
# fully supported by this version of dmidecode.
Handle 0x0055, DMI type 42, 17 bytes
Header and Data:
2A 11 55 00 F0 04 FF 00 00 00 01 02 04 FF FF FF
FF