Subnet changed to wrong fabric, impacting DHCP
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Triaged
|
Medium
|
Unassigned |
Bug Description
Context:
2x rackd, 1x regiond
Ubuntu 22.04, packages, 1:3.3.3-
https:/
After DHCP stopped working on a subnet served by our second rackd controller, we discovered that that subnet had somehow been moved into the wrong fabric. This meant that `dhcpd-interfaces` no longer had the relevant interface (a bridge named admin), and dhcpd.conf had no reference to that subnet.
When we moved the subnet back to the correct fabric, it all started working again as expected.
We did not look at dhcpd.conf on the first rackd controller, which houses that other fabric. The subnet in that other fabric continued to work fine during this time.
We do not know how the subnet was reconfigured. Only three people here know how to do that, and all know enough not to make the 6 or so clicks required to make the change in the web UI. We do not easily know how to do it via the CLI.
Soon I will attach logs from the times in question. From the virsh errors in the logs we believe that the problem began at or very close to 2023-08-
Changed in maas: | |
status: | New → Triaged |
importance: | Undecided → Medium |
milestone: | none → 3.5.0 |
It was around 2023-08- 14T09:21: 36.043193+ 08:00 when we resolved the issue, when the first virsh state changed `Power state has changed from error to on.`
This (and the errors that preceded it) is because the KVM Pod nodes hosting these machines lost the IP on their admin interface as part of the DHCP outage, then regained it once it was fixed.