# ENVIRONMENT
MAAS version (SNAP):
maas 2.8.2-8577-g.a3e674063 8980 2.8/stable canonical✓ -
MAAS was cleanly installed. KVM POD setup works.
MAAS status:
bind9 RUNNING pid 9258, uptime 15:13:02
dhcpd RUNNING pid 26173, uptime 15:09:30
dhcpd6 STOPPED Not started
http RUNNING pid 19526, uptime 15:10:49
ntp RUNNING pid 27147, uptime 14:02:18
proxy RUNNING pid 25909, uptime 15:09:33
rackd RUNNING pid 7219, uptime 15:13:20
regiond RUNNING pid 7221, uptime 15:13:20
syslog RUNNING pid 19634, uptime 15:10:48
Machine:
HPE DL380 Gen10
Storage - comissioning output:
"NAME": "sda", (virtual install drive)
"MODEL": "LUN 00 Media 0",
/devices/pci0000:00/0000:00:14.0/usb2/2-3/2-3.1/2-3.1:1.0/host0/target0:0:0/0:0:0:0/block/sda
"SIZE": "536870912",
"NAME": "sdb", Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 894.2 GiB, RAID1 Logical Drive 1
"MODEL": "LOGICAL VOLUME",
"PATH": "/dev/sdb",
"DEVPATH": "/devices/pci0000:5b/0000:5b:00.0/0000:5c:00.0/host1/target1:1:0/1:1:0:0/block/sdb",
"SIZE": "960163569664",
"NAME": "sdc", (HPE Smart Array P816i-a SR Gen10 - 447.1 GiB, RAID1 Logical Drive 2)
"MODEL": "LOGICAL VOLUME",
"PATH": "/dev/sdc",
"DEVPATH": "/devices/pci0000:5b/0000:5b:00.0/0000:5c:00.0/host1/target1:1:0/1:1:0:1/block/sdc",
"SIZE": "480070426624",
# PROBLEM DESCRIPTION
MAAS fails to reboot into deployed OS. "Local" menu entry in MAAS provided grub.cfg fails to instruct grub to find the bootloader on the local drives and forces to use fallback to EFI boot order.
Root cause
0) identify install device:
2020-10-20T06:56:37+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb({'grub_device': True, 'id': 'sdb', 'model': 'LOGICAL VOLUME', 'name': 'sdb', 'ptable': 'gpt', 'serial': '600508b1001cade9268ac61a1c3cee4b', 'type': 'disk', 'wipe': 'superblock'})
Grub is configured not to touch NVRAM:
2020-10-20T06:57:01+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Transferred {'grub2': 'grub2 grub2/update_nvram boolean false',
1) MAAS installs grub on the machine:
2020-10-20T06:57:02+00:00 cmp3az2cz20300kv8 cloud-init[2459]: start: cmd-install/stage-curthooks/builtin/cmd-curthooks: Installing packages on target system: ['efibootmgr', 'grub-efi-amd64', 'grub-efi-amd64-signed', 'shim-signed']
2020-10-20T06:57:09+00:00 cmp3az2cz20300kv8 cloud-init[2459]: finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: SUCCESS: Installing packages on target system: ['efibootmgr', 'grub-efi-amd64', 'grub-efi-amd64-signed', 'shim-signed']
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: start: cmd-install/stage-curthooks/builtin/cmd-curthooks/install-grub: installing grub to target devices
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: setup grub on target /tmp/tmpxf91lob9/target
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Found primary UEFI ESP: sdb-part1
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Found UEFI ESP(s) for grub install: ['sdb-part1']
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb-part1({'device': 'sdb', 'flag': 'boot', 'id': 'sdb-part1', 'name': 'sdb-part1', 'number': 1, 'offset': '4194304B', 'size': '536870912B', 'type': 'partition', 'uuid': '17649a3f-6e9a-445c-a20a-74914d4c5f88', 'wipe': 'superblock'})
2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb({'grub_device': True, 'id': 'sdb', 'model': 'LOGICAL VOLUME', 'name': 'sdb', 'ptable': 'gpt', 'serial': '600508b1001cade9268ac61a1c3cee4b', 'type': 'disk', 'wipe': 'superblock'})
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Applying grub debconf_selections config:
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: {'debconf_selections': {'grub': 'grub-pc grub-efi/install_devices multiselect /dev/disk/by-id/scsi-3600508b1001cade9268ac61a1c3cee4b-part1'}}
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: installing grub to target=/tmp/tmpxf91lob9/target devices=['/dev/sdb1'] [replace_defaults=None]
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpxf91lob9/target', 'dpkg', '--print-architecture'] with allowed return codes [0] (capture=True)
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: grub: moved /tmp/tmpxf91lob9/target/etc/default/grub.d/50-cloudimg-settings.cfg out of the way
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: updated /tmp/tmpxf91lob9/target/etc/default/grub to set: GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200n8 nvme_core.multipath=0"
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Using grub install command: grub-install
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Grub install cmds:
2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: [['efibootmgr', '-v'], ['dpkg-reconfigure', 'grub-efi-amd64'], ['update-grub'], ['grub-install', '--target=x86_64-efi', '--efi-directory=/boot/efi', '--bootloader-id=ubuntu', '--recheck', '--no-nvram'], ['efibootmgr', '-v']]
2020-10-20T06:57:46+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpxf91lob9/target', 'grub-install', '--target=x86_64-efi', '--efi-directory=/boot/efi', '--bootloader-id=ubuntu', '--recheck', '--no-nvram'] with allowed return codes [0] (capture=True)
2) MAAS sets up the boot order to ensure PXE boot:
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Setting currently booted 0016 as the first UEFI loader.
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: New UEFI boot order: 0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007,0008,0009
Note that the boot order set is:
0016 - NIC (PXE IPv4)
0000 - fail to system utilities
There device where the OS is installed (Boot000B) is futher down in the boot order.
Consult below:
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpxf91lob9/target', 'efibootmgr', '
-o', '0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007,0008,0009'] with allowed return codes [0] (capture=False)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: BootCurrent: 0016
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Timeout: 0 seconds
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: BootOrder: 0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007
,0008,0009
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0000* System Utilities
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0001 Embedded UEFI Shell
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0002 Diagnose Error
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0003 Intelligent Provisioning
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0004 Boot Menu
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0005 Network Boot
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0006 View Integrated Management Log
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0007 HTTP Boot
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0008 PXE Boot
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0009 Embedded Diagnostics
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000A* Generic USB Boot
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000B* Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 447.1 GiB, RAID1 Logical Drive 2(Target:0,
Lun:1)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000C* Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 894.2 GiB, RAID1 Logical Drive 1(Target:0,
Lun:0)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0010* Slot 1 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0012* Slot 1 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0014* Embedded FlexibleLOM 1 Port 1 : HPE Ethernet 1Gb 4-port 366FLR Adapter - NIC (HTTP(S) IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0016* Embedded FlexibleLOM 1 Port 1 : HPE Ethernet 1Gb 4-port 366FLR Adapter - NIC (PXE IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0018* Slot 4 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001A* Slot 4 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001C* Slot 3 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001E* Slot 3 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0020 Trigger ready-to-boot event
3) Finalize configuration:
2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: finish: cmd-install/stage-late/maas: SUCCESS: running 'wget --no-proxy http://10-216-240-0--23.maas-internal:5248/MAAS/metadata/latest/by-id/dfkxqh/ --post-data op=netboot_off -O /dev/null'
4) The server is instructed to reboot. During the reboot is uses MAAS provided grub.cfg:
2020-10-20 06:59:36 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg-d4:f5:ef:02:28:94 requested by 10.216.240.106
MAAS provides grub configuration as follows:
ubuntu@inf1az1cz202904rz:~$ curl tftp://10.216.240.1/grub/grub.cfg-d4:f5:ef:02:28:94
set default="0"
set timeout=0
menuentry 'Local' {
echo 'Booting local disk...'
for bootloader in \
boot/bootx64.efi \
ubuntu/shimx64.efi \
ubuntu/grubx64.efi \
centos/shimx64.efi \
centos/grubx64.efi \
redhat/shimx64.efi \
redhat/grubx64.efi \
rhel/shimx64.efi \
rhel/grubx64.efi \
red/grubx64.efi \
Microsoft/Boot/bootmgfw.efi; do
search --set=root --file /efi/$bootloader
if [ $? -eq 0 ]; then
chainloader /efi/$bootloader
boot
fi
done
# If no bootloader is found exit and allow the next device to boot.
exit
}
Unfortunately this configuration fails to find a bootloader and as such it is dropped to next boot entry, that is to Boot0000* System Utilities.
When in grub environment, following variables are set:
grub> set
grub_platform=efi
cmd_path=(tftp,10.216.240.1)
net_default_interface=efinet3
net_default_ip=10.216.240.106
net_default_mac=d4:f5:ef:02:28:94
net_default_server=10.216.240.1
net_efinet3_boot_file=bootx64.efi
net_efinet3_domain=mgt.tlc.cloud
net_efinet3_ip=10.216.240.106
net_efinet3_mac=d4:f5:ef:02:28:94
net_efinet3_next_server=10.216.240.1
package_version=2.02-2ubuntu8.18
prefix=(tftp,10.216.240.1)/grub
pxe_default_server=10.216.240.1
root=tftp,10.216.240.1
grub> ls
(memdisk) (hd0) (hd0,gpt1)
grub> ls (hd0)
(hd0): Filesystem is unknown.
grub> (hd0,gpt1)
(hd0,gpt1): Filesystem is unknown.
grub> ls (memdisk)
(memdisk): Filesystem is fat.
grub> ls (memdisk)/
grub.cfg
grub> cat (memdisk)/grub.cfg
if [ -e $prefix/x86_64-efi/grub.cfg; ] then
source $prefix/x86_64-efi/grub.cfg
else
source $prefix/grub.cfg
fi
Trying to run the MAAS provided config fails:
grub> search --set=root --file /efi/boot/bootx64.efi
error: no such device: /efi/boot/bootx64.efi
Grub does not see the logical volumes (sdb, sdc) hosted on hardware raid controller when VID is enabled.
After disabling the VID (Intelligent Provisioning->BIOS/Platform Configuration(RBSU)->USB options->Virtual Install Disk-Disable), grub enlists all the partitions:
grub> ls
(hd0) (hd0,gpt2) (hd0,gpt1) (hd1)
grub> search --set=root --file /efi/boot/bootx64.efi
hd0,gpt1
It looks like the deployment works, whats failing is booting into the deployed system. There appears to be two bugs here
1. When a deployment occurs Curtin configures the system to boot locally after trying to boot over the network. This doesn't appear to be happening.
2. GRUB isn't able to see any of the local disks.
When GRUB fails to find a local bootloader it falls back on booting the next configured device. This should be the local system but because Curtin never configures local boot system firmware is started.