So, from all the information above, I don't think we have enough information to know what the issue is.
A. The machine was never instructed to localboot.
B. The machine was instructed to localboot, but grub failed.
C. The machine booted onto the disk, but either didn't get network or failed to contact metadata.
D. There is a firmware issue preventing the machine from accessing the deployed environment.
Looking at the looks, it seems that:
A -> the machine did indeed reboot and accessed the grub config and instructed to localboot.
B -> We don't know if grub failed, because we have no console logs.
C -> Could be the case that network was not configured properly on reboot either due to cloud-init or a bug in netplan. For this we need console logs.
D -> WE need console logs.
So from all the info here, I'm marking this bug as incomplete as we would really need console logs to determine what's the issues. That said, these could also be curtin issues when, while it succeeded, it could have caused have misconfigured something for which the machine never really boot into the installed environment. So, I'm adding curtin to see if they can help us.
@Jason, quick q, are the machines that failed to boot all grub? it seems to me that's the case but just want to double check.
Lastly, we would really need console logs. @Jason, you can setup conserver to automatically gather the logs from the console and share those with MAAS.
So this is what I see on the logs:
1. On rackd.log on .32, I see the machine PXE boot to start the deployment process:
2019-05-01 10:32:33 provisioningser ver.rackdservic es.tftp: [info] bootx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] bootx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] grubx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ command. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ fs.lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ crypto. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ terminal. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/grub.cfg requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/grub. cfg-14: 02:ec:41: c7:dc requested by 10.244.41.7 ver.rackdservic es.http: [info] /images/ ubuntu/ amd64/ga- 18.04/bionic/ daily/boot- kernel requested by 10.244.41.7 ver.rackdservic es.http: [info] /images/ ubuntu/ amd64/ga- 18.04/bionic/ daily/boot- initrd requested by 10.244.41.7 ver.rackdservic es.http: [info] /images/ ubuntu/ amd64/ga- 18.04/bionic/ daily/squashfs requested by 10.244.41.7
2019-05-01 10:32:33 provisioningser
2019-05-01 10:32:33 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:36 provisioningser
2019-05-01 10:32:58 provisioningser
2. On rackd.log on .30, I see it pxe boot post-deployment (and its told to localboot):
2019-05-01 10:38:13 provisioningser ver.rackdservic es.tftp: [info] bootx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] bootx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] grubx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ command. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ fs.lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ crypto. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ terminal. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/grub.cfg requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/grub. cfg-14: 02:ec:41: c7:dc requested by 10.244.41.7
2019-05-01 10:38:13 provisioningser
2019-05-01 10:38:14 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
3. I see that curtin has run the deployment process and hasn't reported any errors - log: https:/ /pastebin. ubuntu. com/p/zMgTttxdS j/ | curtin config: https:/ /pastebin. ubuntu. com/p/Y2ZMX6Rst d/
So, from all the information above, I don't think we have enough information to know what the issue is.
A. The machine was never instructed to localboot.
B. The machine was instructed to localboot, but grub failed.
C. The machine booted onto the disk, but either didn't get network or failed to contact metadata.
D. There is a firmware issue preventing the machine from accessing the deployed environment.
Looking at the looks, it seems that:
A -> the machine did indeed reboot and accessed the grub config and instructed to localboot.
B -> We don't know if grub failed, because we have no console logs.
C -> Could be the case that network was not configured properly on reboot either due to cloud-init or a bug in netplan. For this we need console logs.
D -> WE need console logs.
So from all the info here, I'm marking this bug as incomplete as we would really need console logs to determine what's the issues. That said, these could also be curtin issues when, while it succeeded, it could have caused have misconfigured something for which the machine never really boot into the installed environment. So, I'm adding curtin to see if they can help us.
@Jason, quick q, are the machines that failed to boot all grub? it seems to me that's the case but just want to double check.
Lastly, we would really need console logs. @Jason, you can setup conserver to automatically gather the logs from the console and share those with MAAS.