Machines fail to deploy because cloud-init needs to accept both netplan spellings for grat arp
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Undecided
|
Andres Rodriguez | ||
cloud-init |
Fix Released
|
Medium
|
Ryan Harper | ||
curtin |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Many nodes failed to boot after installation.
Here is one example, beartic.
beartic.
finishes install: 2019-05-
dhcp's after reboot:
10.244.
10.244.
10.244.
10.244.
grub and grub.cfg:
10.244.
10.244.
10.244.
10.244.
10.244.
10.244.
10.244.
10.244.
10.244.
but we never got any rsyslog message or api calls after that.
Related branches
- Server Team CI bot: Needs Fixing (continuous-integration)
- cloud-init Commiters: Pending requested
-
Diff: 767 lines (+204/-315)18 files modifiedcloudinit/config/cc_growpart.py (+2/-1)
cloudinit/config/cc_resizefs.py (+3/-3)
cloudinit/config/cc_ubuntu_advantage.py (+1/-1)
cloudinit/net/network_state.py (+8/-0)
cloudinit/sources/DataSourceNoCloud.py (+23/-17)
cloudinit/util.py (+13/-9)
config/cloud.cfg.tmpl (+2/-2)
debian/changelog (+7/-0)
debian/patches/ubuntu-advantage-revert-tip.patch (+5/-255)
tests/unittests/test_datasource/test_azure.py (+0/-24)
tests/unittests/test_datasource/test_nocloud.py (+18/-0)
tests/unittests/test_distros/test_freebsd.py (+45/-0)
tests/unittests/test_ds_identify.py (+20/-0)
tests/unittests/test_handler/test_handler_resizefs.py (+1/-1)
tests/unittests/test_net.py (+46/-0)
tools/ds-identify (+8/-0)
tools/render-cloudcfg (+1/-1)
tools/run-container (+1/-1)
- Server Team CI bot: Approve (continuous-integration)
- Chad Smith: Approve
- Dan Watkins: Approve
-
Diff: 82 lines (+54/-0)2 files modifiedcloudinit/net/network_state.py (+8/-0)
tests/unittests/test_net.py (+46/-0)
- Andres Rodriguez (community): Approve
- Jason Hobbs (community): Approve
-
Diff: 48 lines (+9/-5)2 files modifiedsrc/maasserver/tests/test_preseed_network.py (+4/-2)
src/provisioningserver/utils/netplan.py (+5/-3)
Changed in maas: | |
status: | New → Incomplete |
tags: | added: cdo-release-blocker |
Changed in maas: | |
status: | Incomplete → New |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
milestone: | 2.6.0rc1 → 2.6.0beta3 |
Changed in cloud-init: | |
assignee: | nobody → Ryan Harper (raharper) |
Changed in maas: | |
status: | Fix Committed → Fix Released |
So this is what I see on the logs:
1. On rackd.log on .32, I see the machine PXE boot to start the deployment process:
2019-05-01 10:32:33 provisioningser ver.rackdservic es.tftp: [info] bootx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] bootx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] grubx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ command. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ fs.lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ crypto. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ terminal. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/grub.cfg requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/grub. cfg-14: 02:ec:41: c7:dc requested by 10.244.41.7 ver.rackdservic es.http: [info] /images/ ubuntu/ amd64/ga- 18.04/bionic/ daily/boot- kernel requested by 10.244.41.7 ver.rackdservic es.http: [info] /images/ ubuntu/ amd64/ga- 18.04/bionic/ daily/boot- initrd requested by 10.244.41.7 ver.rackdservic es.http: [info] /images/ ubuntu/ amd64/ga- 18.04/bionic/ daily/squashfs requested by 10.244.41.7
2019-05-01 10:32:33 provisioningser
2019-05-01 10:32:33 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:34 provisioningser
2019-05-01 10:32:36 provisioningser
2019-05-01 10:32:58 provisioningser
2. On rackd.log on .30, I see it pxe boot post-deployment (and its told to localboot):
2019-05-01 10:38:13 provisioningser ver.rackdservic es.tftp: [info] bootx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] bootx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] grubx64.efi requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ command. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ fs.lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ crypto. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/x86_ 64-efi/ terminal. lst requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/grub.cfg requested by 10.244.41.7 ver.rackdservic es.tftp: [info] /grub/grub. cfg-14: 02:ec:41: c7:dc requested by 10.244.41.7
2019-05-01 10:38:13 provisioningser
2019-05-01 10:38:14 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
2019-05-01 10:38:15 provisioningser
3. I see that curtin has run the deployment process and hasn't reported any errors - log: https:/ /pastebin. ubuntu. com/p/zMgTttxdS j/ | curtin config: https:/ /pastebin. ubuntu. com/p/Y2ZMX6Rst d/
So, from all the information above, I don't think we have enough information to know what the issue is.
A. The machine was never instructed to localboot.
B. The machine was instructed to localboot, but grub failed.
C. The machine booted onto the disk, but either didn't get network or failed to contact metadata.
D. There is a firmw...