optionally remove l3-agent when using octavia

Bug #1843557 reported by Edward Hope-Morley
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Neutron Open vSwitch Charm
Fix Released
High
Edward Hope-Morley
OpenStack Octavia Charm
Triaged
Undecided
Unassigned

Bug Description

In order to use Octavia we colocate neutron-openvswitch to allow usage of tenant layer2 but we are also currently installing the l3-agent which has the side effect that routers can be scheduled onto the same host when it might not have external network connectivity. We should therefore allow optionally removing the l3-agent so that this does not occur.

tags: added: canonical-bootstack
Changed in charm-neutron-openvswitch:
milestone: none → 19.10
assignee: nobody → Edward Hope-Morley (hopem)
importance: Undecided → High
tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-neutron-openvswitch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/681441

Changed in charm-neutron-openvswitch:
status: New → In Progress
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Note that in order for centralized routers to be scheduled to a node l3 agent needs to have dvr_snat mode. In one of the failure scenarios it was not the case so only qrouter and fip namespaces for DVR routers were attempted to be created - but even that can cause problems.

Given that routers are created as distributed by default when DVR is enabled, if `configure-resources` Octavia action was used, a router for the management network is created as distributed.

This results in attempts by neutron-l3-agent to set up qrouter namespaces (which fails in a LXD container):
https://paste.ubuntu.com/p/MwvX5862fC/

This has a difficult to track side effect with IPv6 + SLAAC setup used for the management network subnet by `configure-resources` action, specifically, o-hm0 interfaces set up by the octavia charm cannot get an IP address by interacting with radvd:

ubuntu@juju-4348e8-2-lxd-8:~$ sudo networkctl status o-hm0
● 9: o-hm0
       Link File: n/a
    Network File: /etc/systemd/network/99-charm-octavia-o-hm0.network
            Type: ether
           State: degraded (configuring)
      HW Address: fa:16:3e:9c:a8:ed
         Address: fe80::f816:3eff:fe9c:a8ed

accept_ra sysctl is set to 0 for that interface but this is misleading because systemd-networkd has its own implementation of that:

https://www.freedesktop.org/software/systemd/man/systemd.network.html#IPv6AcceptRA=
"Note that ***kernel's implementation of the IPv6 RA protocol is always disabled, regardless of this setting***"

sysctl -a --pattern accept_ra | grep o-hm0
net.ipv6.conf.o-hm0.accept_ra = 0

As soon as I converted the management router to centralized (openstack router set --disable <rid> -> openstack router set --centralized <rid> -> openstack router set --enable <rid>), non-link-local management port IPv6 address was obtained and configured on o-hm0.

As far as I can see, this has to do with the fact that distributed ports in qrouter namespaces are not L2-reachable unless they are present in qrouter namespaces on the same node.

https://paste.ubuntu.com/p/nv79Z2ssTw/

Either way, for the loadbalancer management network I think we do not need DVR routers to be used (they can be created as --centralized and --ha even if the rest of the deployment will use DVR).

So besides a fix for this issue, we also need to make sure that either Octavia charm uses centralized routers for SLAAC to work or that DHCPv6 is used because currently host files for dnsmasq are empty (as SLAAC is enabled).

Moreover, if a management network is created with IPv4 + DHCP setup (not via octavia's action), I think what I am describing will not be reproducible.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@dmitriis this problem is not specific to the lp-management network, rather it is a result of having a single schedule space for routers which will now have a mix of l3agent with some having access to the external network (e.g. those running on nova-compute hosts) and some not (those running on octavia hosts). We could possibly explore the idea of having a dummy neutron az for octavia units but i think thats going to cause complications for required l2 scheduling so removing the l3-agent from octavia units is probably the way to go. I am testing it now so we'll see.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

The only reason to have a router is to have something provide the IPv6 Router Advertisements. It is not used for anything else and no traffic will ever traverse it.

It will probably be fine to have the charm create a non-distributed router by default.

I would prefer SLAAC over DHCPv6 as it gives the units a predictable and traceable address calculated from the MAC address of the unit.

The gate test for the Octavia charm is currently using a DVR setup with multiple application instances for neutron-openvswitch, so it should be easy to validate any changes.

FWIW; I have never seen the lb-mgmt router being scheduled to one of the Octavia hosts in the gate test, but I do believe you when you say this may happen.

Bugs and patches accepted :)

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Right its random where the routers are scheduled to. As long as there's an l3-agent there it has the potential to have one scheduled to it (and have an l3ha router have its MASTER live on there). But note that in the field we have had their tenant l3ha router master live on an octavia unit and be hit by this.

Ryan Beisner (1chb1n)
tags: added: uosci
Revision history for this message
James Page (james-page) wrote :

Ed - are you using two instances of the neutron-openvswitch charm - 1 for octavia and 1 for the rest of the deployment?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Ed, to add to comment #6, here's the field scenario:

  neutron-openvswitch:
    charm: cs:neutron-openvswitch
    options:
      use-dvr-snat: True # No network nodes, we need this as True
      enable-local-dhcp-and-metadata: True # No network nodes, we need this as True
      data-port: *data-port
      firewall-driver: openvswitch

  neutron-openvswitch-octavia:
    charm: cs:neutron-openvswitch
    num_units: 0
    options:
      worker-multiplier: *worker-multiplier
      firewall-driver: openvswitch

If use-dvr-snat is not set to True, snat- namespaces wouldn't be scheduled to l3 agents on neutron-openvswitch-octavia units.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@james-page in order to leverage this feature i will have to yes.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Also note that if you are using v6 for the lb-mgmt-net (which the charm defaults to using) then your octavia units will need to have access to the external network and therefore will need l3-agent enabled (since the dhcp6 address comes from radvd that neutron runs in qrouter- namespace)

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@dmitriis yes but qrouter- namespaces will still be scheduled there.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

The radvd can run just fine in the qrouter on one of the compute hosts without the need for any external connectivity on the octavia units.

The openvswitch tunnels will take care of the L2 transport of the IPv6 RA packets.

Even if the L3 packages are installed on by the neutron-openvswitch charm the router always ends up on one of the compute notes in the test gate. [0][1]

0: https://github.com/openstack/charm-octavia/blob/master/src/tests
1: https://github.com/openstack-charmers/zaza-openstack-tests/tree/master/zaza/openstack/charm_tests/octavia

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@fnordahl that was also my expectation but in my deployment if i remove the l3-agent my octavia o-hm0 interface no longer receives an ipv6 address (i see the solicitations go out but nothing coming back).

Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

So without dvr-snat on neutron-openvswitch-octavia (comment #7):

1) l3 agents are installed on octavia units;

2) qrouter and fip namespaces are attempted to be created and fail to do so if octavia is deployed into LXD (nested namespaces);

3) snat namespaces are not created on neutron-openvswitch-octavia units (they are scheduled only to dvr_snat agents);

3) radvd is set up only in qrouter namespaces, not snat namespaces;

4) o-hm0 interfaces are unable to obtain IPv6 addresses while amphorae can because they reside on computes where qrouter namespaces are successfully created.

In essence:

* we need to use centralized routers (l3ha or not) for the management network only in `configure-resources` action in octavia to make IPv6 SLAAC setup work;
  * openstack router create --centralized --ha # ...
* l3 agents do not need to be installed on neutron-openvswitch-octavia.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Added charm-octavia to track the IPv6 + SLAAC part (changing configure-resources to use centralized routers).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to charm-octavia (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/682603

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-octavia (master)

Reviewed: https://review.opendev.org/682603
Committed: https://git.openstack.org/cgit/openstack/charm-octavia/commit/?id=afd84ccbc013d1c1d7fa884ab1361987a348c2e5
Submitter: Zuul
Branch: master

commit afd84ccbc013d1c1d7fa884ab1361987a348c2e5
Author: Frode Nordahl <email address hidden>
Date: Tue Sep 17 12:11:12 2019 +0200

    Use a centralized network for the management network

    Change-Id: Iafdc810ac403243aaf2ae2380003d79fb6d96a40
    Related-Bug: #1843557

David Ames (thedac)
Changed in charm-neutron-openvswitch:
milestone: 19.10 → 20.01
Changed in charm-octavia:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-neutron-openvswitch (master)

Change abandoned by Edward Hope-Morley (<email address hidden>) on branch: master
Review: https://review.opendev.org/681441
Reason: Abandoning in favour of https://review.opendev.org/#/c/682602 which will stop agents being installed if deploying inside a container.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-neutron-openvswitch (master)

Reviewed: https://review.opendev.org/682602
Committed: https://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/commit/?id=4b2935d5a69841d8879dc849a4fba00b348b9b1e
Submitter: Zuul
Branch: master

commit 4b2935d5a69841d8879dc849a4fba00b348b9b1e
Author: Frode Nordahl <email address hidden>
Date: Sun Sep 15 20:52:09 2019 +0200

    Don't enable DVR services when deployed in container

    Also set upper constraint for ``python-cinderclient`` in the
    functional test requirements as it relies on the v1 client
    which has been removed. We will not fix this in Amulet, charm
    pending migration to the Zaza framework.

    Change-Id: If4d3b3cd79767b37fe6b74a1d6d399076c122bc8
    Closes-Bug: #1843557

Changed in charm-neutron-openvswitch:
status: In Progress → Fix Committed
Changed in charm-neutron-openvswitch:
status: Fix Committed → Won't Fix
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Fixing bug status since the above patch has landed in neutron-openvswitch

Changed in charm-neutron-openvswitch:
status: Won't Fix → Fix Committed
James Page (james-page)
Changed in charm-neutron-openvswitch:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.