MAAS

controller subnet and IP address change not reflected in maas

Bug #1936249 reported by Bayani Carbone on 2021-07-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Triaged	Medium	Unassigned	MAAS 3.5.0

Bug Description

MAAS snap version: 2.9.2 (9165-g.c3e7848d1)

After a change of subnet and IP addresses in a maas deployment, one of the controllers is still showing its old IP address in maas while the interface on the node shows the new IP address.
Subnet 10.80.0.0/24 was changed to 10.81.0.0/24.
regiond and rackd logs are attached.

15: br305: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 7a:d2:17:84:20:90 brd ff:ff:ff:ff:ff:ff
    inet 10.81.0.12/24 brd 10.81.0.255 scope global br305
       valid_lft forever preferred_lft forever
    inet6 fe80::ec1f:8bff:fef2:7ab5/64 scope link
       valid_lft forever preferred_lft forever

maas root rack-controller read output:
    "system_id": "rpmpsr",
    "ip_addresses": [
        "10.86.0.12",
        "10.82.0.12",
        "10.83.0.12",
        "10.84.0.12",
        "10.80.0.12"
    ]

Chain of events:
- maas running on 3 infra nodes as part of an edge deployment
- a LXD cluster was manually created on those same 3 nodes
- the LXD cluster was added to maas as a vm-host (a vip is used to have a single vm-host representing the LXD cluster in maas)
- VMs were created on that vm-host without issues and a ceph pool was then added to the LXD cluster which was visible also in maas in the vm-host details
- subnet range change was then required for vlan 305 .i.e. change from 10.80.0.0/24 to 10.81.0.0/24. This subnet was part of fabric 0, and was assigned to a space already. The infra nodes each had an interface on the subnet as well as the already created VMs.
- the update on the nodes was made via netplan on each infra node and the subnet was adapted in the maas gui
- the already deployed VMs were deleted via the gui as they had to be re-deployed to pick up the new subnet
- when checking the IP addresses of the controllers in maas, infra2 still has the old IP address
- 3 VMs remained in error state, trying to delete them reported an error trying to contact the vm-host (however the VMs were correctly deleted from the LXD cluster)
- trying to perform a refresh of the vm-host did not work, it just tried for a long time and nothing happened
- trying to delete the vm-host resulted in the same behavior (both via GUI and CLI)
- trying to restart the maas snap did not help
- after a day, tried again to delete the vm-host and this time it worked. However in the logs we could still see connection attempts to the LXD cluster VIP (and a ceph pool error linked to that). But this also removed the VMs in error state.
- trying to add back the LXD cluster did not succeed, it just tries for a long time and nothing happens
- the LXD cluster looks functional, LXD CLI and LXD API access looks to be working fine.

Revision history for this message

Bayani Carbone (bcarbone) wrote on 2021-07-14:

rackd_regiond_logs.zip Edit (653.4 KiB, application/zip)

Revision history for this message

Bayani Carbone (bcarbone) wrote on 2021-07-14:

rack-controllers_read_output.json Edit (127.4 KiB, application/json)

added maas cli output for rack-controllers.

Revision history for this message

Bayani Carbone (bcarbone) wrote on 2021-07-15:

infra2_ip_config.txt Edit (5.2 KiB, text/plain)

infra2 `ip a` full output

Revision history for this message

Christian Grabowski (cgrabowski) wrote on 2021-07-19:

To be clear, the changes were made outside of MAAS and the issue is MAAS did not discover the new IP on a controller or a host managed by MAAS? If it is a controller, what is the state of the controller as reported by:

maas <profile> rack-controller read <system-id of problematic controller>

Changed in maas:
status:	New → Incomplete

Revision history for this message

Bayani Carbone (bcarbone) wrote on 2021-07-19:

The IP change was on a controller. Unfortunately, we redeployed maas in the meantime so I cannot execute the requested command.

Revision history for this message

Björn Tillenius (bjornt) wrote on 2021-08-17:

I see this in the logs:

2021-07-13 18:41:20 provisioningserver.rpc.common: [critical] Unhandled failure
dispatching AMP command. This is probably a bug. Please ensure that this error
is handled within application code or declared in the signature of the b'Updat
eInterfaces' command. [infra1:pid=448381:cmd=UpdateInterfaces:ask=4b0]

...
django.core.exceptions.ValidationError: {'__all__': ['Interface with this Node and Name already exists.']}

Since we have the the information about the existing interfaces, and the new interface definition, we should be able to reproduce this in a unit test.

Changed in maas:
status:	Incomplete → Triaged
importance:	Undecided → High

Revision history for this message

Jerzy Husakowski (jhusakowski) wrote on 2023-09-21:

Let's try to reproduce this scenario in a unit test.

Changed in maas:
importance:	High → Medium
milestone:	none → 3.5.0

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.