Comment 2 for bug 1843557

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Note that in order for centralized routers to be scheduled to a node l3 agent needs to have dvr_snat mode. In one of the failure scenarios it was not the case so only qrouter and fip namespaces for DVR routers were attempted to be created - but even that can cause problems.

Given that routers are created as distributed by default when DVR is enabled, if `configure-resources` Octavia action was used, a router for the management network is created as distributed.

This results in attempts by neutron-l3-agent to set up qrouter namespaces (which fails in a LXD container):
https://paste.ubuntu.com/p/MwvX5862fC/

This has a difficult to track side effect with IPv6 + SLAAC setup used for the management network subnet by `configure-resources` action, specifically, o-hm0 interfaces set up by the octavia charm cannot get an IP address by interacting with radvd:

ubuntu@juju-4348e8-2-lxd-8:~$ sudo networkctl status o-hm0
● 9: o-hm0
       Link File: n/a
    Network File: /etc/systemd/network/99-charm-octavia-o-hm0.network
            Type: ether
           State: degraded (configuring)
      HW Address: fa:16:3e:9c:a8:ed
         Address: fe80::f816:3eff:fe9c:a8ed

accept_ra sysctl is set to 0 for that interface but this is misleading because systemd-networkd has its own implementation of that:

https://www.freedesktop.org/software/systemd/man/systemd.network.html#IPv6AcceptRA=
"Note that ***kernel's implementation of the IPv6 RA protocol is always disabled, regardless of this setting***"

sysctl -a --pattern accept_ra | grep o-hm0
net.ipv6.conf.o-hm0.accept_ra = 0

As soon as I converted the management router to centralized (openstack router set --disable <rid> -> openstack router set --centralized <rid> -> openstack router set --enable <rid>), non-link-local management port IPv6 address was obtained and configured on o-hm0.

As far as I can see, this has to do with the fact that distributed ports in qrouter namespaces are not L2-reachable unless they are present in qrouter namespaces on the same node.

https://paste.ubuntu.com/p/nv79Z2ssTw/

Either way, for the loadbalancer management network I think we do not need DVR routers to be used (they can be created as --centralized and --ha even if the rest of the deployment will use DVR).

So besides a fix for this issue, we also need to make sure that either Octavia charm uses centralized routers for SLAAC to work or that DHCPv6 is used because currently host files for dnsmasq are empty (as SLAAC is enabled).

Moreover, if a management network is created with IPv4 + DHCP setup (not via octavia's action), I think what I am describing will not be reproducible.