Comment 3 for bug 1827238

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1827238] Re: 2.6beta2: many nodes failed deployment with time out

On Wed, May 1, 2019 at 11:40 AM Andres Rodriguez <email address hidden>
wrote:

> So this is what I see on the logs:
>
> 1. On rackd.log on .32, I see the machine PXE boot to start the
> deployment process:
>
> 2019-05-01 10:32:33 provisioningserver.rackdservices.tftp: [info]
> bootx64.efi requested by 10.244.41.7
> 2019-05-01 10:32:33 provisioningserver.rackdservices.tftp: [info]
> bootx64.efi requested by 10.244.41.7
> 2019-05-01 10:32:33 provisioningserver.rackdservices.tftp: [info]
> grubx64.efi requested by 10.244.41.7
> 2019-05-01 10:32:34 provisioningserver.rackdservices.tftp: [info]
> /grub/x86_64-efi/command.lst requested by 10.244.41.7
> 2019-05-01 10:32:34 provisioningserver.rackdservices.tftp: [info]
> /grub/x86_64-efi/fs.lst requested by 10.244.41.7
> 2019-05-01 10:32:34 provisioningserver.rackdservices.tftp: [info]
> /grub/x86_64-efi/crypto.lst requested by 10.244.41.7
> 2019-05-01 10:32:34 provisioningserver.rackdservices.tftp: [info]
> /grub/x86_64-efi/terminal.lst requested by 10.244.41.7
> 2019-05-01 10:32:34 provisioningserver.rackdservices.tftp: [info]
> /grub/grub.cfg requested by 10.244.41.7
> 2019-05-01 10:32:34 provisioningserver.rackdservices.tftp: [info]
> /grub/grub.cfg-14:02:ec:41:c7:dc requested by 10.244.41.7
> 2019-05-01 10:32:34 provisioningserver.rackdservices.http: [info]
> /images/ubuntu/amd64/ga-18.04/bionic/daily/boot-kernel requested by
> 10.244.41.7
> 2019-05-01 10:32:36 provisioningserver.rackdservices.http: [info]
> /images/ubuntu/amd64/ga-18.04/bionic/daily/boot-initrd requested by
> 10.244.41.7
> 2019-05-01 10:32:58 provisioningserver.rackdservices.http: [info]
> /images/ubuntu/amd64/ga-18.04/bionic/daily/squashfs requested by 10.244.41.7
>
> 2. On rackd.log on .30, I see it pxe boot post-deployment (and its told
> to localboot):
>
> 2019-05-01 10:38:13 provisioningserver.rackdservices.tftp: [info]
> bootx64.efi requested by 10.244.41.7
> 2019-05-01 10:38:13 provisioningserver.rackdservices.tftp: [info]
> bootx64.efi requested by 10.244.41.7
> 2019-05-01 10:38:14 provisioningserver.rackdservices.tftp: [info]
> grubx64.efi requested by 10.244.41.7
> 2019-05-01 10:38:15 provisioningserver.rackdservices.tftp: [info]
> /grub/x86_64-efi/command.lst requested by 10.244.41.7
> 2019-05-01 10:38:15 provisioningserver.rackdservices.tftp: [info]
> /grub/x86_64-efi/fs.lst requested by 10.244.41.7
> 2019-05-01 10:38:15 provisioningserver.rackdservices.tftp: [info]
> /grub/x86_64-efi/crypto.lst requested by 10.244.41.7
> 2019-05-01 10:38:15 provisioningserver.rackdservices.tftp: [info]
> /grub/x86_64-efi/terminal.lst requested by 10.244.41.7
> 2019-05-01 10:38:15 provisioningserver.rackdservices.tftp: [info]
> /grub/grub.cfg requested by 10.244.41.7
> 2019-05-01 10:38:15 provisioningserver.rackdservices.tftp: [info]
> /grub/grub.cfg-14:02:ec:41:c7:dc requested by 10.244.41.7
>
>
> 3. I see that curtin has run the deployment process and hasn't reported
> any errors - log: https://pastebin.ubuntu.com/p/zMgTttxdSj/ | curtin
> config: https://pastebin.ubuntu.com/p/Y2ZMX6Rstd/
>
> So, from all the information above, I don't think we have enough
> information to know what the issue is.
>
> A. The machine was never instructed to localboot.
> B. The machine was instructed to localboot, but grub failed.
> C. The machine booted onto the disk, but either didn't get network or
> failed to contact metadata.
> D. There is a firmware issue preventing the machine from accessing the
> deployed environment.
>
> Looking at the looks, it seems that:
>
> A -> the machine did indeed reboot and accessed the grub config and
> instructed to localboot.
> B -> We don't know if grub failed, because we have no console logs.
> C -> Could be the case that network was not configured properly on reboot
> either due to cloud-init or a bug in netplan. For this we need console logs.
> D -> WE need console logs.
>
> So from all the info here, I'm marking this bug as incomplete as we
> would really need console logs to determine what's the issues. That
> said, these could also be curtin issues when, while it succeeded, it
> could have caused have misconfigured something for which the machine
> never really boot into the installed environment. So, I'm adding curtin
> to see if they can help us.
>
> @Jason, quick q, are the machines that failed to boot all grub? it seems
> to me that's the case but just want to double check.
>

I'm not sure what you mean by this - they are all UEFI machines. They
should be using EFI, not using grub from an MBR.