Deployment fails when node doesnt have interface on first subnet in fabric
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Triaged
|
Medium
|
Unassigned |
Bug Description
I have a case with my a MAAS host on 2 subnets on a single fabric. One subnet is the "internal" subnet, the other is a "public" VLAN. Most nodes in the MAAS have interfaces on both the internal and public subnets, however, using a node with interfaces on both the internal and public subnet, i stood up a kvm host and then added a handful of kvms on that node with only one interface on public subnet. The kvm's (but also any node) with out an interface on the internal subnet fails to deploy because it fails to report to MAAS that the deployment has completed. This is because the report script called by cloud-init is choosing to report to MAAS over the internal interface instead of the public interface.
The deployment of the node, does finish, and since it is a kvm i was able to connect to the virtual console and import my ssh key to the node and extract the logs. From the cloud-init-
2023-05-23 14:45:48,836 - handlers.
or
2023-05-23 15:01:08,388 - handlers.
where cloud-init is trying to post the data back to maas on the internal subnet (10.1.10.0) instead of the subnet it has access to on the public subnet.
I think the report jobs should always insure that they choose a subnet to report back on that the node and the rack controller both have access to. You could also block deployments if you try to deploy a node and it has no way of reporting back to MAAS that it succeeded.
I did some digging on to why this is happening and found that when building the report task[1] for cloud-init MAAS is electing to use the first[2] network on the rack controller, even if the node doesnt have an interface on that network.
1) https:/
2) https:/
tags: | added: bug-council |
Changed in maas: | |
importance: | Undecided → Medium |
milestone: | none → 3.5.0 |
status: | New → Triaged |
tags: | removed: bug-council |