controller subnet and IP address change not reflected in maas

Bug #1936249 reported by Bayani Carbone
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Medium
Unassigned

Bug Description

MAAS snap version: 2.9.2 (9165-g.c3e7848d1)

After a change of subnet and IP addresses in a maas deployment, one of the controllers is still showing its old IP address in maas while the interface on the node shows the new IP address.
Subnet 10.80.0.0/24 was changed to 10.81.0.0/24.
regiond and rackd logs are attached.

15: br305: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7a:d2:17:84:20:90 brd ff:ff:ff:ff:ff:ff
    inet 10.81.0.12/24 brd 10.81.0.255 scope global br305
       valid_lft forever preferred_lft forever
    inet6 fe80::ec1f:8bff:fef2:7ab5/64 scope link
       valid_lft forever preferred_lft forever

maas root rack-controller read output:
    "system_id": "rpmpsr",
    "ip_addresses": [
        "10.86.0.12",
        "10.82.0.12",
        "10.83.0.12",
        "10.84.0.12",
        "10.80.0.12"
    ]

Chain of events:
 - maas running on 3 infra nodes as part of an edge deployment
 - a LXD cluster was manually created on those same 3 nodes
 - the LXD cluster was added to maas as a vm-host (a vip is used to have a single vm-host representing the LXD cluster in maas)
 - VMs were created on that vm-host without issues and a ceph pool was then added to the LXD cluster which was visible also in maas in the vm-host details
 - subnet range change was then required for vlan 305 .i.e. change from 10.80.0.0/24 to 10.81.0.0/24. This subnet was part of fabric 0, and was assigned to a space already. The infra nodes each had an interface on the subnet as well as the already created VMs.
 - the update on the nodes was made via netplan on each infra node and the subnet was adapted in the maas gui
 - the already deployed VMs were deleted via the gui as they had to be re-deployed to pick up the new subnet
 - when checking the IP addresses of the controllers in maas, infra2 still has the old IP address
 - 3 VMs remained in error state, trying to delete them reported an error trying to contact the vm-host (however the VMs were correctly deleted from the LXD cluster)
 - trying to perform a refresh of the vm-host did not work, it just tried for a long time and nothing happened
 - trying to delete the vm-host resulted in the same behavior (both via GUI and CLI)
 - trying to restart the maas snap did not help
 - after a day, tried again to delete the vm-host and this time it worked. However in the logs we could still see connection attempts to the LXD cluster VIP (and a ceph pool error linked to that). But this also removed the VMs in error state.
 - trying to add back the LXD cluster did not succeed, it just tries for a long time and nothing happens
 - the LXD cluster looks functional, LXD CLI and LXD API access looks to be working fine.

Revision history for this message
Bayani Carbone (bcarbone) wrote :
Revision history for this message
Bayani Carbone (bcarbone) wrote :

added maas cli output for rack-controllers.

Revision history for this message
Bayani Carbone (bcarbone) wrote :

infra2 `ip a` full output

Revision history for this message
Christian Grabowski (cgrabowski) wrote :

To be clear, the changes were made outside of MAAS and the issue is MAAS did not discover the new IP on a controller or a host managed by MAAS? If it is a controller, what is the state of the controller as reported by:

maas <profile> rack-controller read <system-id of problematic controller>

Changed in maas:
status: New → Incomplete
Revision history for this message
Bayani Carbone (bcarbone) wrote :

The IP change was on a controller. Unfortunately, we redeployed maas in the meantime so I cannot execute the requested command.

Revision history for this message
Björn Tillenius (bjornt) wrote :

I see this in the logs:

2021-07-13 18:41:20 provisioningserver.rpc.common: [critical] Unhandled failure
 dispatching AMP command. This is probably a bug. Please ensure that this error
 is handled within application code or declared in the signature of the b'Updat
eInterfaces' command. [infra1:pid=448381:cmd=UpdateInterfaces:ask=4b0]

...
        django.core.exceptions.ValidationError: {'__all__': ['Interface with this Node and Name already exists.']}

Since we have the the information about the existing interfaces, and the new interface definition, we should be able to reproduce this in a unit test.

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → High
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Let's try to reproduce this scenario in a unit test.

Changed in maas:
importance: High → Medium
milestone: none → 3.5.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.