Maas DHCP not working any more since 2.9 snap refresh from 2021-02-17

Bug #1916472 reported by Jochen Wezel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Alberto Donato
snapd
Expired
Undecided
Unassigned

Bug Description

## Problem description

Maas Server incl. DHCP was working fine till latest snap refresh from 2021-02-17.

Since that refresh, DHCP requests are not answered by maas DHCP (and not logged any more) and the server (Hyper-V VM) with 4 CPUs has permanently a very high CPU usage, mostly between 70 to 100%.

Restarting maas or the VM didn't resolve the issue.

The DHCP service of the additional MAAS rack server doesn't work any more, too. So, all DHCP devices of my network are OFFLINE since that date (okay, as a workaround, I started another DHCP server meanwhile, so I'm not completely offline; but this is not a real resolution).

### Exemplary htop results

  1 [||||||||||||||||||||||||||||||||||||||||||||||||||||| 76.3%] Tasks: 113, 115 thr; 4 running
  2 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 81.1%] Load average: 6.44 6.86 7.08
  3 [|||||||||||||||||||||||||||||||||||||||||||||||||||||| 78.4%] Uptime: 1 day, 12:50:57
  4 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 83.5%]
  Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||1.77G/3.12G]
  Swp[| 780K/2.00G]

    PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
   1035 root 20 0 732M 120M 25352 S 100. 3.8 21h13:54 python3 /snap/maas/11851/sbin/regiond
   1032 root 20 0 809M 136M 25372 S 98.7 4.3 21h23:38 python3 /snap/maas/11851/sbin/regiond
   1213 root 20 0 809M 136M 25372 R 98.7 4.3 2h59:04 python3 /snap/maas/11851/sbin/regiond
   1036 root 20 0 877M 122M 25448 S 94.3 3.8 21h12:41 python3 /snap/maas/11851/sbin/regiond
   1034 root 20 0 661M 122M 25992 S 93.4 3.8 21h14:48 python3 /snap/maas/11851/sbin/regiond
   1159 root 20 0 877M 122M 25448 R 91.7 3.8 3h51:04 python3 /snap/maas/11851/sbin/regiond
   1106 root 20 0 661M 122M 25992 R 90.0 3.8 3h30:43 python3 /snap/maas/11851/sbin/regiond
   1232 root 20 0 732M 120M 25352 S 89.1 3.8 3h54:01 python3 /snap/maas/11851/sbin/regiond
  85066 postgres 20 0 213M 25068 20600 S 21.8 0.8 0:00.44 postgres: 12/main: maas maasdb 192.168.1.22(58746) idle
  85019 postgres 20 0 213M 29240 24616 S 19.2 0.9 0:01.68 postgres: 12/main: maas maasdb 192.168.1.22(58724) idle
  84943 postgres 20 0 213M 29400 24740 S 16.6 0.9 0:03.34 postgres: 12/main: maas maasdb 192.168.1.22(58698) idle
  84746 postgres 20 0 214M 29456 23844 S 14.8 0.9 0:10.23 postgres: 12/main: maas maasdb 192.168.1.22(58630) idle
  85085 root 20 0 5784 4124 2904 R 10.5 0.1 0:00.74 htop
   1098 root 20 0 732M 120M 25352 R 7.0 3.8 3h47:18 python3 /snap/maas/11851/sbin/regiond
    942 root 20 0 676M 90328 10296 S 3.5 2.8 23:35.87 /snap/maas/11851/usr/sbin/named -c /var/snap/maas/11851/bind/named.conf -g
   1143 root 20 0 732M 120M 25352 S 2.6 3.8 3h45:38 python3 /snap/maas/11851/sbin/regiond
  84924 postgres 20 0 213M 25068 20600 S 2.6 0.8 0:04.56 postgres: 12/main: maas maasdb 192.168.1.22(58694) idle
  46450 root 20 0 877M 122M 25448 S 1.7 3.8 1h26:59 python3 /snap/maas/11851/sbin/regiond
    975 root 20 0 676M 90328 10296 S 1.7 2.8 5:37.90 /snap/maas/11851/usr/sbin/named -c /var/snap/maas/11851/bind/named.conf -g
    946 root 20 0 408M 85868 16900 S 0.9 2.6 1h25:48 python3 /snap/maas/11851/sbin/rackd
    948 root 20 0 940M 110M 24892 S 0.9 3.5 2h19:45 python3 /snap/maas/11851/sbin/regiond
    972 root 20 0 676M 90328 10296 S 0.9 2.8 5:33.69 /snap/maas/11851/usr/sbin/named -c /var/snap/maas/11851/bind/named.conf -g
    688 root 20 0 29384 24608 9124 S 0.9 0.8 16:32.13 python3 /snap/maas/11851/bin/supervisord -d /var/snap/maas/11851/supervisord -c /var/snap/ma
    973 root 20 0 676M 90328 10296 S 0.9 2.8 5:46.30 /snap/maas/11851/usr/sbin/named -c /var/snap/maas/11851/bind/named.conf -g
   3352 root 20 0 661M 122M 25992 S 0.9 3.8 3h28:31 python3 /snap/maas/11851/sbin/regiond
   1171 root 20 0 59272 54244 11860 S 0.9 1.7 3:10.70 python3 /snap/maas/11851/bin/maas-rack observe-arp eth0
  84346 administr 20 0 13932 6240 4768 S 0.9 0.2 0:00.43 sshd: administrator@pts/0
  84594 postgres 20 0 221M 33868 25184 S 0.9 1.0 0:17.07 postgres: 12/main: maas maasdb 192.168.1.22(58580) idle
  84839 postgres 20 0 220M 33428 24808 S 0.9 1.0 0:09.09 postgres: 12/main: maas maasdb 192.168.1.22(58662) idle
   1155 root 20 0 661M 122M 25992 S 0.0 3.8 3h32:34 python3 /snap/maas/11851/sbin/regiond
   1119 root 20 0 877M 122M 25448 S 0.0 3.8 3h44:57 python3 /snap/maas/11851/sbin/regiond
  84976 postgres 20 0 213M 28352 23780 S 0.0 0.9 0:03.42 postgres: 12/main: maas maasdb 192.168.1.22(58712) idle
  84815 postgres 20 0 212M 24072 19772 S 0.0 0.7 0:10.49 postgres: 12/main: maas maasdb 192.168.1.22(58650) idle
  85086 postgres 20 0 211M 20816 17344 S 0.0 0.6 0:00.10 postgres: 12/main: maas maasdb 192.168.1.22(58750) idle in transaction
  74516 root 20 0 877M 122M 25448 S 0.0 3.8 19:43.06 python3 /snap/maas/11851/sbin/regiond
    974 root 20 0 676M 90328 10296 S 0.0 2.8 5:41.78 /snap/maas/11851/usr/sbin/named -c /var/snap/maas/11851/bind/named.conf -g
    846 postgres 20 0 209M 27212 25152 S 0.0 0.8 0:46.77 /usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main -c config_file=/etc/postg
  83723 postgres 20 0 214M 31780 25772 S 0.0 1.0 0:30.23 postgres: 12/main: maas maasdb 192.168.1.22(58388) idle
   1208 root 20 0 809M 136M 25372 S 0.0 4.3 3h00:31 python3 /snap/maas/11851/sbin/regiond
  84590 postgres 20 0 214M 31304 25520 S 0.0 1.0 0:12.30 postgres: 12/main: maas maasdb 192.168.1.22(58576) idle in transaction
   1100 root 20 0 732M 120M 25352 S 0.0 3.8 3h49:23 python3 /snap/maas/11851/sbin/regiond
   1115 root 20 0 809M 136M 25372 S 0.0 4.3 3h00:45 python3 /snap/maas/11851/sbin/regiond
   1144 root 20 0 732M 120M 25352 S 0.0 3.8 3h45:33 python3 /snap/maas/11851/sbin/regiond
   6212 zabbix 20 0 22068 3480 2252 S 0.0 0.1 1:41.63 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
   1099 root 20 0 661M 122M 25992 S 0.0 3.8 3h30:22 python3 /snap/maas/11851/sbin/regiond
  84604 postgres 20 0 214M 26496 20920 S 0.0 0.8 0:14.58 postgres: 12/main: maas maasdb 192.168.1.22(58584) idle in transaction
  85006 postgres 20 0 213M 29180 24300 S 0.0 0.9 0:01.97 postgres: 12/main: maas maasdb 192.168.1.22(58720) idle in transaction
   3350 root 20 0 661M 122M 25992 S 0.0 3.8 3h22:10 python3 /snap/maas/11851/sbin/regiond
   2620 root 20 0 877M 122M 25448 S 0.0 3.8 3h40:40 python3 /snap/maas/11851/sbin/regiond
    683 root 20 0 189M 18920 15448 S 0.0 0.6 0:22.89 php-fpm: master process (/etc/php/7.4/fpm/php-fpm.conf)
F1Help F2Setup F3SearchF4FilterF5Tree F6SortByF7Nice -F8Nice +F9Kill F10Quit

## Maas snap version

Current maas snap version:
 2.9/stable: 2.9.2-9164-g.ac176b5c4 2021-02-17 (11851) 150MB

Last week, `maas changes` showed an interesting detail, that the refreshed, running maas container had a slightly lower version number than the previous "old" version which had got a higher build no. (This week, maas changes doesn't show any changes any more.)

## snap logs maas

2021-02-17T21:05:28Z dhcpd[396438]: DHCPDISCOVER from 00:15:5d:03:34:43 via eth0
2021-02-17T21:05:28Z dhcpd[396438]: DHCPOFFER on 192.168.1.27 to 00:15:5d:03:34:43 via eth0
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[0] = /snap/maas/11778/usr/sbin/maas-dhcp-helper
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[1] = notify
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[2] = --action
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[3] = commit
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[4] = --mac
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[5] = 0:4:13:41:bc:73
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[6] = --ip-family
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[7] = ipv4
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[8] = --ip
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[9] = 192.168.6.211
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[10] = --lease-time
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[11] = 3600
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[12] = --hostname
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[13] = snom870-41BC73
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[14] = --socket
2021-02-17T21:06:52Z dhcpd[396438]: execute_statement argv[15] = /var/snap/maas/common/maas/dhcpd.sock
2021-02-17T21:06:52Z dhcpd[396438]: DHCPREQUEST for 192.168.6.211 from 00:04:13:41:bc:73 (snom870-41BC73) via eth0
2021-02-17T21:06:52Z dhcpd[396438]: DHCPACK on 192.168.6.211 to 00:04:13:41:bc:73 (snom870-41BC73) via eth0
2021-02-17T21:07:53Z systemd[1]: Stopping Service for snap application maas.supervisor...
2021-02-17T21:07:53Z rsyslogd[396289]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="396289" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-17T21:07:59Z systemd[1]: snap.maas.supervisor.service: Succeeded.
2021-02-17T21:07:59Z systemd[1]: Stopped Service for snap application maas.supervisor.
2021-02-17T21:08:31Z systemd[1]: Started Service for snap application maas.supervisor.
2021-02-17T21:08:43Z rsyslogd[470229]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="470229" x-info="https://www.rsyslog.com"] start
2021-02-18T09:54:39Z systemd[1]: Stopping Service for snap application maas.supervisor...
2021-02-18T09:54:40Z rsyslogd[470229]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="470229" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-18T09:54:42Z systemd[1]: snap.maas.supervisor.service: Succeeded.
2021-02-18T09:54:42Z systemd[1]: Stopped Service for snap application maas.supervisor.
2021-02-18T09:55:00Z systemd[1]: Started Service for snap application maas.supervisor.
2021-02-18T09:55:28Z rsyslogd[1048]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1048" x-info="https://www.rsyslog.com"] start
2021-02-18T11:03:59Z systemd[1]: Stopping Service for snap application maas.supervisor...
2021-02-18T11:03:59Z rsyslogd[1048]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1048" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-18T11:04:00Z systemd[1]: snap.maas.supervisor.service: Succeeded.
2021-02-18T11:04:00Z systemd[1]: Stopped Service for snap application maas.supervisor.
2021-02-18T11:22:23Z systemd[1]: Started Service for snap application maas.supervisor.
2021-02-18T11:22:46Z rsyslogd[1081]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1081" x-info="https://www.rsyslog.com"] start
2021-02-18T14:22:41Z systemd[1]: Stopping Service for snap application maas.supervisor...
2021-02-18T14:22:41Z rsyslogd[1081]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1081" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-18T14:22:44Z systemd[1]: snap.maas.supervisor.service: Succeeded.
2021-02-18T14:22:44Z systemd[1]: Stopped Service for snap application maas.supervisor.
2021-02-18T14:22:44Z systemd[1]: Started Service for snap application maas.supervisor.
2021-02-18T14:23:00Z rsyslogd[10132]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="10132" x-info="https://www.rsyslog.com"] start
2021-02-18T14:23:10Z rsyslogd[10132]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="10132" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-18T14:23:13Z rsyslogd[10280]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="10280" x-info="https://www.rsyslog.com"] start
2021-02-18T14:23:29Z rsyslogd[10280]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="10280" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-18T14:23:29Z rsyslogd[10439]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="10439" x-info="https://www.rsyslog.com"] start
2021-02-18T16:14:35Z systemd[1]: Stopping Service for snap application maas.supervisor...
2021-02-18T16:14:35Z rsyslogd[10439]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="10439" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-18T16:14:49Z systemd[1]: snap.maas.supervisor.service: Succeeded.
2021-02-18T16:14:49Z systemd[1]: Stopped Service for snap application maas.supervisor.
2021-02-18T16:14:49Z systemd[1]: Started Service for snap application maas.supervisor.
2021-02-18T16:15:10Z rsyslogd[14945]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="14945" x-info="https://www.rsyslog.com"] start
2021-02-18T16:25:26Z systemd[1]: Stopping Service for snap application maas.supervisor...
2021-02-18T16:25:26Z rsyslogd[14945]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="14945" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-18T16:25:39Z systemd[1]: snap.maas.supervisor.service: Succeeded.
2021-02-18T16:25:39Z systemd[1]: Stopped Service for snap application maas.supervisor.
2021-02-18T16:25:48Z systemd[1]: Started Service for snap application maas.supervisor.
2021-02-18T16:25:56Z maas.supervisor[15887]: /snap/maas/11778/usr/lib/python3/dist-packages/netaddr/strategy/__init__.py:189: SyntaxWarning: "is not" with a literal. Did you mean "!="?
2021-02-18T16:25:56Z maas.supervisor[15887]: if word_sep is not '':
2021-02-18T16:26:13Z rsyslogd[16009]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="16009" x-info="https://www.rsyslog.com"] start
2021-02-18T16:29:39Z systemd[1]: Stopping Service for snap application maas.supervisor...
2021-02-18T16:29:39Z rsyslogd[16009]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="16009" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-18T16:29:57Z systemd[1]: snap.maas.supervisor.service: Succeeded.
2021-02-18T16:29:57Z systemd[1]: Stopped Service for snap application maas.supervisor.
2021-02-18T16:30:38Z systemd[1]: Started Service for snap application maas.supervisor.
2021-02-18T16:30:53Z rsyslogd[16836]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="16836" x-info="https://www.rsyslog.com"] start
2021-02-20T21:44:05Z systemd[1]: Stopping Service for snap application maas.supervisor...
2021-02-20T21:44:09Z rsyslogd[16836]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="16836" x-info="https://www.rsyslog.com"] exiting on signal 15.
2021-02-20T21:44:17Z systemd[1]: snap.maas.supervisor.service: Succeeded.
2021-02-20T21:44:17Z systemd[1]: Stopped Service for snap application maas.supervisor.
2021-02-20T21:45:07Z systemd[1]: Started Service for snap application maas.supervisor.
2021-02-20T21:46:29Z rsyslogd[1049]: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="1049" x-info="https://www.rsyslog.com"] start

Related branches

Revision history for this message
Jochen Wezel (jwezel) wrote :

* Controller runs on Ubuntu 20.04
* Controller overview page indicates green check marks for all controllers
* Single controller page indicates green check marks for all controller services except DHCP

Revision history for this message
Ian Johnson (anonymouse67) wrote :

This smells much more like a MAAS bug than a snapd bug to me, is there anything that indicates snapd is what broke MAAS here?

Changed in snapd:
status: New → Invalid
status: Invalid → Incomplete
Revision history for this message
Jochen Wezel (jwezel) wrote :

I've done my level best but I don't find any more hints. I'd like to find them, but I don't know how. Please advise!

Revision history for this message
Bill Wear (billwear) wrote :

not able to triage: aforementioned MAAS snap refresh does not cause this problem on my machine, DHCP works fine when creating new VMs in LXD.

Revision history for this message
Björn Tillenius (bjornt) wrote :

I also can't reproduce this on our systems.

The screenshot indicates that DHCP isn't running. Can you please attach a tarball of /var/snap/maas/common/logs, so that we can see what's going on?

Changed in maas:
status: New → Incomplete
Revision history for this message
Jochen Wezel (jwezel) wrote :
Download full text (4.6 KiB)

After the missing hint to the logs directory of maas by @bjornt, I've found following error in my regiond.log:
---
2021-02-24 12:49:40 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:111'.
        Traceback (most recent call last):
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1475, in gotResult
            _inlineCallbacks(r, g, status)
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1464, in _inlineCallbacks
            status.deferred.errback()
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in errback
            self._startRunCallbacks(fail)
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/snap/maas/11851/lib/python3.8/site-packages/maasserver/rack_controller.py", line 277, in <lambda>
            d.addErrback(lambda f: f.trap(NoConnectionsAvailable))
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/python/failure.py", line 439, in trap
            self.raiseException()
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/python/failure.py", line 467, in raiseException
            raise self.value.with_traceback(self.tb)
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/snap/maas/11851/lib/python3.8/site-packages/maasserver/dhcp.py", line 864, in configure_dhcp
            config = yield deferToDatabase(get_dhcp_configuration, rack_controller)
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
            result = inContext.theWork()
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
            inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/snap/maas/11851/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext
            return func(*args,**kw)
          File "/snap/maas/11851/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 860, in callInContext
            return func(*args, **kwargs)
          File "/snap/maas/11851/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 192, in wrapper
            result =...

Read more...

Revision history for this message
Jochen Wezel (jwezel) wrote :

The current content of table "public"."maasserver_controllerinfo":

created |updated |node_id|version |interfaces|interface_update_hints
-------------------+-------------------+-------+----------------------+----------+----------------------
25.12.2020 22:59:27|17.02.2021 20:45:21|111 |2.9.2-9164-g.ac176b5c4|"" |""
25.12.2020 21:11:37|18.02.2021 17:30:50|110 |2.9.2-9164-g.ac176b5c4|"" |""

Revision history for this message
Jochen Wezel (jwezel) wrote :

I don't know if it might be related, this is the content of SQL
select * from "public"."maasserver_interface" where node_id = 111

id |created |updated |name|type |mac_address |ipv4_params|ipv6_params|params|tags |enabled|node_id|vlan_id|acquired|mdns_discovery_state|neighbour_discovery_state|firmware_version|product|vendor|interface_speed|link_connected|link_speed|numa_node_id|sriov_max_vf
----+-------------------+-------------------+----+--------+-------------+-----------+-----------+------+---------------+-------+-------+-------+--------+--------------------+-------------------------+----------------+-------+------+---------------+--------------+----------+------------+------------
1908|25.12.2020 21:11:44|20.02.2021 22:47:07|eth0|physical|**sensitive**|"" |"" |"" |System.String[]|True |110 |5001 |False |True |True |N/A | | |1000 |True |1000 | |0

Revision history for this message
Jochen Wezel (jwezel) wrote :

## Attached the network configuration of the maas server (slightly anonymized)

## /etc/netplan/00-installer-config.yaml

# This is the network config written by 'subiquity'
network:
  ethernets:
    eth0:
      addresses:
      - 192.168.1.22/19
      - 192.168.1.26/19
      gateway4: 192.168.1.28
      nameservers:
        addresses:
        - 192.168.1.28
        search:
        - maas
        - mgmt-domain.com
  version: 2

## ifconfig

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 192.168.1.22 netmask 255.255.224.0 broadcast 192.168.31.255
        inet6 fe80::**sensitive** prefixlen 64 scopeid 0x20<link>
        ether 00:15:**sensitive** txqueuelen 1000 (Ethernet)
        RX packets 10248505 bytes 1668492449 (1.6 GB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 8887175 bytes 2199476563 (2.1 GB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 131365937 bytes 147543060231 (147.5 GB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 131365937 bytes 147543060231 (147.5 GB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Revision history for this message
Jochen Wezel (jwezel) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for snapd because there has been no activity for 60 days.]

Changed in snapd:
status: Incomplete → Expired
Changed in maas:
status: Expired → New
Revision history for this message
Alberto Donato (ack) wrote :

Hi Jochen,

if you're still experiencing the issue, could you please paste the output of this query on the maas db?

select ci.node_id, i.id, i.name, ip.ip from maasserver_staticipaddress ip, maasserver_interface i, maasserver_interface_ip_addresses i_ip, maasserver_controllerinfo ci where i_ip.interface_id = i.id and i_ip.staticipaddress_id = ip.id and i.node_id = ci.node_id;

Changed in maas:
status: New → Incomplete
Revision history for this message
Jochen Wezel (jwezel) wrote :

This is the result to your query:

node_id|id |name|ip
-------+----+----+------------
111 |1911|eth0|192.168.1.27
110 |1908|eth0|192.168.1.22
110 |1908|eth0|192.168.1.26

Revision history for this message
Jochen Wezel (jwezel) wrote :

and yes, I'm still experiencing the issue...

Revision history for this message
Alberto Donato (ack) wrote :

Thanks!

Could you also attach the output from the following commissioning scripts on the controller with id 110: 40-maas-01-network-interfaces and 50-maas-01-commissioning

Revision history for this message
Jochen Wezel (jwezel) wrote :
Download full text (9.5 KiB)

## 40-maas-01-network-interfaces

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:03:35:11 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.22/19 brd 192.168.31.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.26/19 brd 192.168.31.255 scope global secondary eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fe03:3511/64 scope link
       valid_lft forever preferred_lft forever

## 50-maas-01-commissioning

{
    "api_extensions": [
        "resources",
        "resources_cpu_socket",
        "resources_gpu",
        "resources_numa",
        "resources_v2",
        "resources_disk_sata",
        "resources_usb_pci",
        "resources_cpu_threads_numa",
        "resources_cpu_core_die",
        "api_os",
        "resources_system"
    ],
    "api_version": "1.0",
    "environment": {
        "kernel": "Linux",
        "kernel_architecture": "x86_64",
        "kernel_version": "5.4.0-73-generic",
        "os_name": "ubuntu",
        "os_version": "20.04",
        "server": "maas-machine-resources",
        "server_name": "srvmaasdb01",
        "server_version": "4.4"
    },
    "resources": {
        "cpu": {
            "architecture": "x86_64",
            "sockets": [
                {
                    "name": "Intel(R) Xeon(R) CPU E5645 @ 2.40GHz",
                    "vendor": "GenuineIntel",
                    "socket": 0,
                    "cache": [
                        {
                            "level": 1,
                            "type": "Data",
                            "size": 32768
                        },
                        {
                            "level": 1,
                            "type": "Instruction",
                            "size": 32768
                        },
                        {
                            "level": 2,
                            "type": "Unified",
                            "size": 262144
                        },
                        {
                            "level": 3,
                            "type": "Unified",
                            "size": 12582912
                        }
                    ],
                    "cores": [
                        {
                            "core": 0,
                            "die": 0,
                            "threads": [
                                {
                                    "id": 0,
                                    "numa_node": 0,
                                    "thread": 0,
                                    "online": true,
                                    "isolated": false
                                }
                            ]
                        },
                        {
                            "core...

Read more...

Changed in maas:
status: Incomplete → New
Revision history for this message
Björn Tillenius (bjornt) wrote :

I've looked at the code in maasserver.dhcp.get_default_dns_servers()

In the logs you provided, it says "No DNS servers found, DHCP defaulting to region IP.".

So that tells me, that dns_servers get set to [default_region_ip]. And looking at the traceback, it means that default_region_ip is None.

So there's two questions that needs to be answered there. Why are no DNS servers found in the first place, and why is default_region_ip None.

To start, can you provide output from the following API commands:

  maas $maas_profile maas get-config name=use_rack_proxy
  maas $maas_profile subnets read
  maas $maas_profile rack-controllers read
  maas $maas_profile region-controllers read

Changed in maas:
status: New → Incomplete
Revision history for this message
Jochen Wezel (jwezel) wrote :
Download full text (30.4 KiB)

Here is the output of the requested commands plus one more status output:

root@srvmaasdb01:/home/administrator# maas status
bind9 RUNNING pid 975, uptime 4 days, 10:26:43
dhcpd STOPPED Not started
dhcpd6 STOPPED Not started
http RUNNING pid 1252, uptime 4 days, 10:25:42
ntp RUNNING pid 1088, uptime 4 days, 10:26:17
proxy RUNNING pid 1148, uptime 4 days, 10:26:00
rackd RUNNING pid 978, uptime 4 days, 10:26:43
regiond RUNNING pid 981, uptime 4 days, 10:26:42
syslog RUNNING pid 1090, uptime 4 days, 10:26:17
root@srvmaasdb01:/home/administrator# maas admin maas get-config name=use_rack_proxy
Success.
Machine-readable output follows:
true
root@srvmaasdb01:/home/administrator# maas admin subnets read
Success.
Machine-readable output follows:
[
    {
        "name": "192.168.0.0/19",
        "description": "rmmh LAN",
        "vlan": {
            "vid": 0,
            "mtu": 1500,
            "dhcp_on": true,
            "external_dhcp": null,
            "relay_vlan": null,
            "id": 5001,
            "secondary_rack": "cekrss",
            "space": "space-0",
            "primary_rack": "thebsb",
            "fabric": "fabric-rmmh-LAN",
            "name": "untagged",
            "fabric_id": 0,
            "resource_uri": "/MAAS/api/2.0/vlans/5001/"
        },
        "cidr": "192.168.0.0/19",
        "rdns_mode": 2,
        "gateway_ip": "192.168.1.28",
        "dns_servers": [
            "192.168.1.28",
            "192.168.1.31",
            "192.168.1.32"
        ],
        "allow_dns": true,
        "allow_proxy": true,
        "active_discovery": true,
        "managed": true,
        "id": 1,
        "space": "space-0",
        "resource_uri": "/MAAS/api/2.0/subnets/1/"
    },
    {
        "name": "subnet-2003:d7:a714:a500::/64",
        "description": "IPv6",
        "vlan": {
            "vid": 0,
            "mtu": 1500,
            "dhcp_on": true,
            "external_dhcp": null,
            "relay_vlan": null,
            "id": 5001,
            "secondary_rack": "cekrss",
            "space": "space-0",
            "primary_rack": "thebsb",
            "fabric": "fabric-rmmh-LAN",
            "name": "untagged",
            "fabric_id": 0,
            "resource_uri": "/MAAS/api/2.0/vlans/5001/"
        },
        "cidr": "2003:d7:a714:a500::/64",
        "rdns_mode": 2,
        "gateway_ip": null,
        "dns_servers": [],
        "allow_dns": true,
        "allow_proxy": true,
        "active_discovery": false,
        "managed": true,
        "id": 5,
        "space": "space-0",
        "resource_uri": "/MAAS/api/2.0/subnets/5/"
    },
    {
        "name": "DMZ",
        "description": "rmmh DMZ",
        "vlan": {
            "vid": 0,
            "mtu": 1500,
            "dhcp_on": false,
            "external_dhcp": null,
            "relay_vlan": null,
            "id": 5003,
            "secondary_rack": null,
            "space": "space-dmz",
           ...

Revision history for this message
Jochen Wezel (jwezel) wrote :

@bjornt Do the logs help? Do you need further information?

Revision history for this message
Jochen Wezel (jwezel) wrote :
Download full text (4.6 KiB)

After update to channel=3.0/stable, the error in logs is reported as follows:
---

2021-06-23 12:59:09 maasserver.dhcp: [warn] No DNS servers found, DHCP defaulting to region IP.
2021-06-23 12:59:09 maasserver.rack_controller: [critical] Failed configuring DHCP on rack controller 'id:110'.
        Traceback (most recent call last):
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1475, in gotResult
            _inlineCallbacks(r, g, status)
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1464, in _inlineCallbacks
            status.deferred.errback()
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 501, in errback
            self._startRunCallbacks(fail)
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 568, in _startRunCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/snap/maas/15003/lib/python3.8/site-packages/maasserver/rack_controller.py", line 281, in <lambda>
            d.addErrback(lambda f: f.trap(NoConnectionsAvailable))
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/python/failure.py", line 439, in trap
            self.raiseException()
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/python/failure.py", line 467, in raiseException
            raise self.value.with_traceback(self.tb)
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/snap/maas/15003/lib/python3.8/site-packages/maasserver/dhcp.py", line 880, in configure_dhcp
            config = yield deferToDatabase(get_dhcp_configuration, rack_controller)
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
            result = inContext.theWork()
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
            inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/snap/maas/15003/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext
            return func(*args,**kw)
          File "/snap/maas/15003/lib/python3.8/site-packages/provisioningserver/utils/twisted.py", line 870, in callInContext
            return func(*args, **kwargs)
          File "/snap/maas/15003/lib/python3.8/site-packages/provisioningser...

Read more...

Revision history for this message
Alberto Donato (ack) wrote :

Could you please attach the following logs from your MAAS server: /var/snap/maas/common/log/{regiond,rackd,supervisor-run}.log ?

Revision history for this message
Jochen Wezel (jwezel) wrote :
Revision history for this message
Jochen Wezel (jwezel) wrote :
Revision history for this message
Jochen Wezel (jwezel) wrote :
Revision history for this message
Alberto Donato (ack) wrote :

Could you please try running the following steps on one of the maas nodes:

1) sudo snap run --shell maas
2) from the new shell: maas-region shell
3) in the python REPL that gets spawned, paste the following:

from itertools import chain

from massserver.models import RackController
from maasserver.dhcp import *

r = RackController.objects.get(id=110)

v = {vlan: list(chain(*split_managed_ipv4_ipv6_subnets(vlan.subnet_set.all()))) for vlan in gen_managed_vlans_for(r)}

for vlan, subnets in v.items():
    print("---", vlan)
    for subnet in subnets:
        print(">>>>", subnet, get_default_dns_servers(r, subnet))

and paste the output.

Revision history for this message
Jochen Wezel (jwezel) wrote :

administrator@srvmaasdb01:~$ sudo snap run --shell maas
root@srvmaasdb01:/home/administrator# maas-region shell
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from itertools import chain
>>> from massserver.models import RackController
Traceback (most recent call last):
  File "<console>", line 1, in <module>
ModuleNotFoundError: No module named 'massserver'
>>> from maasserver.models import RackController
>>> from maasserver.dhcp import *
>>> r = RackController.objects.get(id=110)
>>> v = {vlan: list(chain(*split_managed_ipv4_ipv6_subnets(vlan.subnet_set.all())))
... for vlan in gen_managed_vlans_for(r)}
>>> for vlan, subnets in v.items():
... print("---", vlan)
... for subnet in subnets:
... print(">>>>", subnet, get_default_dns_servers(r, subnet))
...
--- fabric-rmmh-LAN.untagged
>>>> 192.168.0.0/19:192.168.0.0/19(vid=0) [IPAddress('192.168.1.22'), IPAddress('192.168.1.26'), IPAddress('192.168.1.27')]
>>>> subnet-2003:d7:a714:a500::/64:2003:d7:a714:a500::/64(vid=0) [None]
>>>

Revision history for this message
Alberto Donato (ack) wrote :

It seems default_region_ip = get_source_address(subnet.get_ipnetwork()) is None for the IPv6 subnet, which evenutally causes [None] (instead of an empty list) to be returned as dns servers.

I'm still investigating on why get_source_address() actually returns None in this case

Changed in maas:
status: Incomplete → Triaged
importance: Undecided → High
importance: High → Medium
assignee: nobody → Alberto Donato (ack)
milestone: none → next
Revision history for this message
Jochen Wezel (jwezel) wrote :

Maybe this information is helpful, too:

This is the current interface configuration:
------
root@srvmaasdb01:/home/administrator# ifconfig -a
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 192.168.1.22 netmask 255.255.224.0 broadcast 192.168.31.255
        inet6 fe80::215:5dff:fe03:3511 prefixlen 64 scopeid 0x20<link>
        ether 00:15:5d:03:35:11 txqueuelen 1000 (Ethernet)
        RX packets 16990024 bytes 3740088614 (3.7 GB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 13549103 bytes 3708545156 (3.7 GB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 1000 (Local Loopback)
        RX packets 160797907 bytes 189517653046 (189.5 GB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 160797907 bytes 189517653046 (189.5 GB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

------

This is the netplan configuration:
------
root@srvmaasdb01:/etc/netplan# cat 00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  ethernets:
    eth0:
      addresses:
      - 192.168.1.22/19
      - 192.168.1.26/19
      gateway4: 192.168.1.28
      nameservers:
        addresses:
        - 192.168.1.28
        search:
        - maas
        - mgmt.compumaster.de
  version: 2
------

IPv6 is not configured in my network, so e.g. ping to google with IPv6 doesn't work:
------
root@srvmaasdb01:/etc/netplan# ping www.google.de -6
ping: connect: Network is unreachable
root@srvmaasdb01:/etc/netplan# ping www.google.de -4
PING www.google.de (142.250.186.131) 56(84) bytes of data.
64 bytes from 142.250.186.131: icmp_seq=1 ttl=117 time=9.73 ms
64 bytes from 142.250.186.131: icmp_seq=2 ttl=117 time=12.9 ms
64 bytes from 142.250.186.131: icmp_seq=3 ttl=117 time=7.97 ms
^C
--- www.google.de ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2007ms
rtt min/avg/max/mdev = 7.965/10.191/12.882/2.033 ms
------

Revision history for this message
Jochen Wezel (jwezel) wrote :
Revision history for this message
Jochen Wezel (jwezel) wrote :
Revision history for this message
Alberto Donato (ack) wrote :

Oh I see, so you have an IPv6 subnet defined (2003:d7:a714:a500::/64), but maas has no address on that subnet.

Then it makes sense for get_source_address() to return None. I'll fix the [] vs [None] issue.

As a workaround for now, can you please try to delete that vlan/subnet as it's unused?

Changed in maas:
status: Triaged → In Progress
Revision history for this message
Jochen Wezel (jwezel) wrote :
Download full text (6.3 KiB)

Immediately after removing the subnet from my LAN-fabric, the error in the logs disappeared. Also the CPU utilization is reduced from 80-100% down to 0-15% which is pretty fine :-)

New logs are now:
------
==> supervisor-run.log <==
2021-06-30 09:27:36,104 INFO spawned: 'dhcpd' with pid 895455

==> dhcpd.log <==
Internet Systems Consortium DHCP Server 4.4.1
Copyright 2004-2018 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Config file: /var/snap/maas/common/maas/dhcpd.conf
Database file: /var/snap/maas/common/maas/dhcp/dhcpd.leases
PID file: /var/snap/maas/common/maas/dhcp/dhcpd.pid
Wrote 0 class decls to leases file.
Wrote 0 deleted host decls to leases file.
Wrote 0 new dynamic host decls to leases file.
Wrote 150 leases to leases file.
Listening on LPF/eth0/00:15:5d:03:35:11/vlan-5001
Sending on LPF/eth0/00:15:5d:03:35:11/vlan-5001
Sending on Socket/fallback/fallback-net
failover peer failover-vlan-5001: I move from communications-interrupted to startup

==> regiond.log <==
2021-06-30 09:27:42 regiond: [info] 192.168.1.27 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
2021-06-30 09:27:46 maasserver.dhcp: [info] Successfully configured DHCPv4 on rack controller 'srvmaasrack02 (cekrss)'.
2021-06-30 09:27:47 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'srvmaasrack02 (cekrss)'.
2021-06-30 09:27:47 maasserver.dhcp: [info] Successfully configured DHCPv4 on rack controller 'srvmaasrack02 (cekrss)'.
2021-06-30 09:27:47 maasserver.dhcp: [info] Successfully configured DHCPv4 on rack controller 'srvmaasdb01 (thebsb)'.

==> supervisor-run.log <==
2021-06-30 09:27:46,461 INFO success: dhcpd entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

==> regiond.log <==
2021-06-30 09:27:47 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'srvmaasrack02 (cekrss)'.
2021-06-30 09:27:47 maasserver.dhcp: [info] Successfully configured DHCPv6 on rack controller 'srvmaasdb01 (thebsb)'.
2021-06-30 09:27:55 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'srvmaasdb01' (thebsb).
2021-06-30 09:27:55 maasserver: [error] Error while calling ScanNetworks: Unable to get RPC connection for rack controller 'srvmaasrack02' (cekrss).
2021-06-30 09:27:55 maasserver.regiondservices.active_discovery: [info] Active network discovery: Unable to initiate network scanning on any rack controller. Verify that the rack controllers are started and have connected to the region.
2021-06-30 09:27:58 regiond: [info] 192.168.1.22 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
2021-06-30 09:28:12 regiond: [info] 192.168.1.27 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
2021-06-30 09:28:28 regiond: [info] 192.168.1.22 GET /MAAS/rpc/ HTTP/1.1 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)
2021-06-30 09:28:42 regiond: [info] 192.168.1.27 GET /MAAS/r...

Read more...

Revision history for this message
Alberto Donato (ack) wrote :

Great! The attached MP should fix the issue even in the case a subnet like that is present.
Thanks for the help debugging.

Revision history for this message
Jochen Wezel (jwezel) wrote :

Thank YOU! :-)

Changed in maas:
status: In Progress → Fix Committed
Bill Wear (billwear)
Changed in maas:
milestone: next → 3.1.0-beta1
Bill Wear (billwear)
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.