MAAS

[2.2] DHCP ntp-server setting can be misconfigured with an IP of a different fabric/vlan

Bug #1695083 reported by Andres Rodriguez on 2017-06-01

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	MAAS	Fix Released	Critical	Mike Pontillo	MAAS 2.3.0alpha1
	2.2	Fix Released	Critical	Mike Pontillo	MAAS 2.2.1

Bug Description

The MAAS rack discovered a new interface bridge (lxdbr0). After the discovery of such bridge, MAAS created a new Fabric/VLAN/Subnet (e.g. fabric-5/untagged/10.10.10.0/24). After the discovery,

That said, I was providing DHCP from fabric-3/untagged which has two subnets 10.90.90.0/24 and 192.168.100.0/24.

After restarting the rack controller, the "ntp-servers" configuration in DHCP was updated pointing to the IP address of the lxdbr0. You can see this as:

shared-network vlan-5002 {
subnet 10.90.90.0 netmask 255.255.255.0 {

           ignore-client-uids true;
           next-server 10.90.90.1;
           option subnet-mask 255.255.255.0;
           option broadcast-address 10.90.90.255;
           option domain-name-servers 10.90.90.1;
           option domain-name "maas";
           option routers 10.90.90.254;
           option ntp-servers 10.10.10.1;

}
subnet 192.168.100.0 netmask 255.255.255.0 {

           ignore-client-uids true;
           next-server 192.168.100.5;
           option subnet-mask 255.255.255.0;
           option broadcast-address 192.168.100.255;
           option domain-name-servers 10.90.90.1;
           option domain-name "maas";
           option ntp-servers 10.10.10.1;
    }
}

To revert it back, two things were tried:

Removing 'lxdbr0'

1. Remove 'lxdbr0'
2. restart maas-rackd
3. ntp-servers option was updated accordingly.

Changing newly created VLAN to a different L2 space

1. Added a new L2 space (test).
2. Moved fabric-5/untagged to 'test' space.
3. restarted maas-rackd.

This updated the configuration of dhcpd to have the correct NTP server.

That said, what seems to be happening:

1. MAAS using the IP of a different subnet (in a different fabric/vlan) for NTP because it is in the same space.
2. Trigger that would have updated the config automatically in the event of a space change.
3. Removing an interface should trigger an update as well.

See original description

Related branches

lp://staging/~mpontillo/maas/ntp-issues--bug-1695083

Merged into lp://staging/~maas-committers/maas/trunk at revision 6073

Blake Rouse (community): Approve on 2017-06-05

lp://staging/~mpontillo/maas/ntp-issues--bug-1695083--2.2

Merged into lp://staging/maas/2.2 at revision 6063

Mike Pontillo (community): Approve on 2017-06-05

Andres Rodriguez (andreserl) on 2017-06-01

description:

updated

Andres Rodriguez (andreserl) on 2017-06-01

Changed in maas:
milestone:	none → 2.2.1
importance:	Undecided → Critical
status:	New → Triaged
milestone:	2.2.1 → 2.3.0

Revision history for this message

BenLake (me-benlake) wrote on 2017-06-01:

I think this happened to me as well, but I'm not as clear on the timing of the new fabric coming up and when NTP server IPs on deployment changed. Also, the NTP server IP chosen was not that of the new fabric; so my issue may not be quite the same as the OPs. However, the IP chosen, whenever it was chosen, was non-functional. I have no way to specify which fabric/subnet/address should be used when cascading to deployed nodes.

In any case, I don't really care for most of the behind-the-scenes automagical discovery. Anything that is changing a config should be confirmed (even if it is a yet-to-be config). So having processes discover things is handy, but nothing should take effect until confirmed (my $0.02).

Revision history for this message

Andres Rodriguez (andreserl) wrote on 2017-06-01:

Yes, this is the same issue you had this morning. The bug reflects how we reproduced it in our side, but doesn't necessarily mean it is the only way. Effectively it can happen in any way.

Andres Rodriguez (andreserl) on 2017-06-01

summary:

- [2.2] NTP misconfigured after the Rack discovered a new 'lxdbr0'
- interface
+ [2.2] DHCP ntp-server setting can be misconfigured with an IP of a
+ different fabric/vlan

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-06-03:

The fix for this is actually trickier than it looks, because in order to do what I consider to be the "best" fix, we would need to change the RPC schema. That is because NTP servers are specified on a per-rack-controller basis, not a per-shared-subnet basis.

So I think the best compromise for now is, when selecting the NTP server to provide for a rack controller, prefer subnets on VLANs with DHCP enabled. I've proposed a branch that does that[1].

The other bug here is that if spaces are changed, the "best routable NTP server" calculation changes, but the database triggers don't notice the changed spaces and don't recalculate the DHCP configuration. I feel like with the "choose DHCP-enabled VLANs first" change, this is a less likely to occur edge case, and should be handled as a separate bug.

[1]:
https://code.launchpad.net/~mpontillo/maas/ntp-issues--bug-1695083/+merge/325037

MAAS Lander (maas-lander) on 2017-06-05

Changed in maas:
status:	Triaged → Fix Committed

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-06-05:

By the way, the other thing I determined was that NTP configuration in DHCP has nothing to do with spaces. It simply selects an IP address on the rack controller and uses that for DHCP.

Spaces are used by the internal NTP configuration in order to figure out the appropriate peer NTP servers, not by DHCP. So the trigger on spaces doesn't apply to this bug.

Revision history for this message

Mike Pontillo (mpontillo) wrote on 2017-06-05:

Finally, I've filed bug #1695937 to capture a related issue that can occur if the rack manages multiple VLANs and/or subnets, and those subnets are not mutually reachable.

Mike Pontillo (mpontillo) on 2017-06-05

Changed in maas:
assignee:	nobody → Mike Pontillo (mpontillo)

Andres Rodriguez (andreserl) on 2017-07-28

Changed in maas:
milestone:	2.3.0 → 2.3.0alpha1

Andres Rodriguez (andreserl) on 2017-08-02

Changed in maas:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1702096

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.