[2.1b1] Unable to commission nodes in MAAS 2.1, no external route provided

Bug #1630794 reported by Jeff Lane 
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned

Bug Description

I upgraded my MAAS server from 2.0 to 2.1 and am currently on Beta 1.

I selected a newly added node and clicked Commission. The node powers on and boots the ephemeral, however it is unable to commission because it can not reach the outside world.

The MAAS server has an external interface and can connect, however, when looking at the failing node during commissioning, I noticed that it does not have a default route outside the MAAS network space:

ubuntu@xwing:/var/log$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.0.0 * 255.0.0.0 U 0 0 0 enp2s0

Note this ONLY provides a route to the 10.0.0.0 address space. This makes it impossible to connect to things like archive.ubuntu.com.

ubuntu@xwing:/var/log$ host archive.ubuntu.com
archive.ubuntu.com has address 91.189.88.149
archive.ubuntu.com has address 91.189.88.161
archive.ubuntu.com has address 91.189.88.162
archive.ubuntu.com has address 91.189.88.152
archive.ubuntu.com has IPv6 address 2001:67c:1360:8001::21
archive.ubuntu.com has IPv6 address 2001:67c:1560:8001::14
archive.ubuntu.com has IPv6 address 2001:67c:1360:8001::17
archive.ubuntu.com has IPv6 address 2001:67c:1560:8001::11
ubuntu@xwing:/var/log$ ping archive.ubuntu.com
connect: Network is unreachable
ubuntu@xwing:/var/log$ sudo route add default gw 10.0.0.1
sudo: unable to resolve host xwing
ubuntu@xwing:/var/log$ ping archive.ubuntu.com
PING archive.ubuntu.com (91.189.88.162) 56(84) bytes of data.
64 bytes from yukinko.canonical.com (91.189.88.162): icmp_seq=1 ttl=56 time=103 ms
64 bytes from yukinko.canonical.com (91.189.88.162): icmp_seq=2 ttl=56 time=103 ms
^C
--- archive.ubuntu.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 103.280/103.299/103.319/0.321 ms

After I add the default gateway manually, I can access archive.ubuntu.com

Revision history for this message
Jeff Lane  (bladernr) wrote :
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Can you confirm:

1. DHCP is up and running.
2. Can you confirm that dhcpd.conf has gateway definitions (/var/lib/maas/dhcpd.conf).
3. Can you try to manually DHCP from MAAS and see if it gets the correct network information ?

Changed in maas:
status: New → Incomplete
milestone: none → 2.1.0
Revision history for this message
Jeff Lane  (bladernr) wrote : Re: [Bug 1630794] Re: Unable to commission nodes in MAAS 2.1, no external route provided

So the issue seems to be that by default, MAAS is not setting a
default gateway. It picked up two fabrics:

Fabric-1 was the internal 10.0.0.0/8 network and has a gateway of 10.0.0.1.
Fabric-2 was the external 192.168.0.0/24 network and MAAS did not set
a gateway for that, as it was an unmanaged network.

With 2.1, while it detected and set up the 10.0.0.0 network, it didn't
set a gateway of 10.0.0.1 for that network, hence a lot of issues in
enlistment and commissioning.

After I manually added a gateway via the UI, I am now able to
commission nodes because they are being set with the gateway route.

Here's the interesting bits of dhcpd.conf before and after:

subnet 10.0.0.0 netmask 255.0.0.0 {
           ignore-client-uids true;
           option subnet-mask 255.0.0.0;
           option broadcast-address 10.255.255.255;
           option domain-name-servers 10.0.0.1;
           option domain-name "maas";
           option ntp-servers 10.0.0.1;

           default-lease-time 600;
           max-lease-time 600;
           #
           # Subnet DHCP snippets
           #
           pool {
              range 10.0.2.0 10.0.3.254;
           }
    }

And here's after setting the gateway via the UI:
subnet 10.0.0.0 netmask 255.0.0.0 {
           ignore-client-uids true;
           option subnet-mask 255.0.0.0;
           option broadcast-address 10.255.255.255;
           option domain-name-servers 10.0.0.1;
           option domain-name "maas";
           option routers 10.0.0.1;
           option ntp-servers 10.0.0.1;

           default-lease-time 600;
           max-lease-time 600;
           #
           # Subnet DHCP snippets
           #
           pool {
              range 10.0.2.0 10.0.3.254;
           }
    }

So the issue seems to just be that MAAS is not telling nodes to use
the MAAS Server as the default gateway automatically, I'm having to
manually enter the MAAS server's IP. I'm guessing this has something
to do with changes to how racks are handled, so maybe it should
default to the IP of the rack controller (which in my case is the maas
server).

summary: - Unable to commission nodes in MAAS 2.1, no external route provided
+ [2.1b1] Unable to commission nodes in MAAS 2.1, no external route
+ provided
Revision history for this message
Andres Rodriguez (andreserl) wrote :

also, please check if you set a gateway for the subnet wher ethe dynamic range is.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

I am unable to reproduce this. On a fresh installation of MAAS 2.1, my networks where discovered correctly (including gateway_ip). Enabling DHCP set the routers option in the dhcpd.conf for the subnet, and enlisting machines are able to access my router to access the archive.

If you could provide exact steps on how to reproduce this that would be great.

Changed in maas:
status: Incomplete → Opinion
status: Opinion → Incomplete
Revision history for this message
Jeff Lane  (bladernr) wrote :

I'm not able to reproduce it now either, and I've tried three times (on VMs).

The original steps I took were:

Start with a working and in-use 2.0 server
Upgrade to 2.0.
Delete nodes
Re-enlist nodes
** New nodes failed to set power type because they couldn't talk to archive.ubuntu.com
Manually set power type
Commission a node
** Commission failed because node couldn't talk to archive.ubuntu.com

That's when I SSH'd to the node and ran route -n and discovered no gateway was set. Which led me to checking the network config in the UI and noticing that for that subnet, there was no gateway set.

I'm starting to think that maybe I just had a bum upgrade and some things didn't happen that should have, because I've not been able to reproduce any of the issues I saw when I upgraded my server, and I've tried recreating these issues several times at this point.

Changed in maas:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.