Bug #1781856 “Default gateway for LXD containers cannot be influ...” : Bugs : Canonical Juju

Trent Lloyd (lathiat) on 2018-07-16

tags:	added: sts
description:	updated

Revision history for this message

Dmitrii Shcherbakov (dmitriis) wrote on 2018-07-16:

#1

Download full text (3.2 KiB)

Since you linked the bug I created about multi-homing you might know the workarounds already but I will summarize them just in case.

One of the ways to workaround the problem is using source-based policy routing (as you mentioned) for receiving TCP traffic. For sending traffic static routes have to be used as the destination host has to be known to direct traffic to the right outbound hop.

I suspect that using a charm to set policy rules can be a problem if the "first space" is not the one that needs to be used to contact the Juju controller from a machine/unit agent and the controller is not on the same L2 (in a different subnet) - this case could be quite relevant with L3 leaf-spine deployments with Juju HA enabled.

Regardless of how this is applied (cloud-init or charm), the following could be used:

1) with TCP (even with using an unbound listening socket - 0.0.0.0/INADDR_ANY, see man 2 listen and man 7 ip), you can rely on the fact that a connected socket of your TCP server will use a source address that was specified as a destination address on a client. When the client creates its own socket (5-tuple) to establish a TCP connection, it does not expect a source address of a response packet to magically change. Unless there is a broken NAT configuration, the receiving host with the TCP server uses received_packet.destination_addr as connected_socket.source_addr.

This allows you to avoid static routes and handle "unknown sender" scenarios correctly for receiving traffic with the following rules:

CIDR=192.168.1.0/24 # e.g. if you have eth1 <-> 192.168.1.10
ip route add default via $GATEWAY table $TABLE # add a default route to a different table
# add a policy rule to use per-interface-subnet routing tables without hitting rp_filter by using asymmetric routing
ip rule add from $CIDR table $TABLE priority $PRIORITY

The trick is that a request will come to 192.168.1.10 from, say, 1.1.1.1 and a response source address will be selected as 192.168.1.10. The TCP server's kernel will then inspect the response packet source address and forward it using a $GATEWAY in $TABLE. This might be counter-intuitive as the locally-generated response is a subject of a policy rule - not a request.

A simple charm that could be used for that lives here (it can be improved to avoid hard-coding the interface):
https://git.launchpad.net/~canonical-bootstack/charm-policy-routing/tree/hooks/config-changed
https://jujucharms.com/u/canonical-bootstack/policy-routing

2) For UDP and unbound sockets (INADDR_ANY) the problem is that you only have one receiving (listening) socket and no connected socket. Your UDP server kernel figures out a source address to use during sendto(2) execution (getsockname would get the result). This is nicely summarized here: http://laforge.gnumonks.org/blog/20171020-local_ip_unbound_udp/

Fortunately, most of our workloads are TCP and we do not hit that problem that often. For OpenStack deployments designate-bind might be problematic in case multiple interfaces are used for its container.

3) For sending traffic either static routes or VRF + SO_BINDTODEVICE have to be used as you either have to know exactly how to route to a given end h...

Since you linked the bug I created about multi-homing you might know the workarounds already but I will summarize them just in case.

One of the ways to workaround the problem is using source-based policy routing (as you mentioned) for receiving TCP traffic. For sending traffic static routes have to be used as the destination host has to be known to direct traffic to the right outbound hop.

I suspect that using a charm to set policy rules can be a problem if the "first space" is not the one that needs to be used to contact the Juju controller from a machine/unit agent and the controller is not on the same L2 (in a different subnet) - this case could be quite relevant with L3 leaf-spine deployments with Juju HA enabled.

Regardless of how this is applied (cloud-init or charm), the following could be used:

1) with TCP (even with using an unbound listening socket - 0.0.0.0/INADDR_ANY, see man 2 listen and man 7 ip), you can rely on the fact that a connected socket of your TCP server will use a source address that was specified as a destination address on a client. When the client creates its own socket (5-tuple) to establish a TCP connection, it does not expect a source address of a response packet to magically change. Unless there is a broken NAT configuration, the receiving host with the TCP server uses received_packet.destination_addr as connected_socket.source_addr.

This allows you to avoid static routes and handle "unknown sender" scenarios correctly for receiving traffic with the following rules:

CIDR=192.168.1.0/24 # e.g. if you have eth1 <-> 192.168.1.10
ip route add default via $GATEWAY table $TABLE # add a default route to a different table
# add a policy rule to use per-interface-subnet routing tables without hitting rp_filter by using asymmetric routing 
ip rule add from $CIDR table $TABLE priority $PRIORITY

The trick is that a request will come to 192.168.1.10 from, say, 1.1.1.1 and a response source address will be selected as 192.168.1.10. The TCP server's kernel will then inspect the response packet source address and forward it using a $GATEWAY in $TABLE. This might be counter-intuitive as the locally-generated response is a subject of a policy rule - not a request.

A simple charm that could be used for that lives here (it can be improved to avoid hard-coding the interface):
https://git.launchpad.net/~canonical-bootstack/charm-policy-routing/tree/hooks/config-changed
https://jujucharms.com/u/canonical-bootstack/policy-routing

2) For UDP and unbound sockets (INADDR_ANY) the problem is that you only have one receiving (listening) socket and no connected socket. Your UDP server kernel figures out a source address to use during sendto(2) execution (getsockname would get the result). This is nicely summarized here: http://laforge.gnumonks.org/blog/20171020-local_ip_unbound_udp/

Fortunately, most of our workloads are TCP and we do not hit that problem that often. For OpenStack deployments designate-bind might be problematic in case multiple interfaces are used for its container.

3) For sending traffic either static routes or VRF + SO_BINDTODEVICE have to be used as you either have to know exactly how to route to a given end host or bind a socket used for sending to a certain interface associated with a routing table through a VRF.

Revision history for this message

John A Meinel (jameinel) wrote on 2018-07-17: Re: [Bug 1781856] Re: Default gateway for LXD containers cannot be influenced/changed

#2

Download full text (4.1 KiB)

Changing the default gateway seems very much a bandaid solution, as if you
have >1 interface, sort of by definition you want some traffic to go out
one interface for a reason different to other traffic. And changing the
'default' is going to be wrong for some portion of your traffic.

On Mon, Jul 16, 2018 at 2:40 AM, Dmitrii Shcherbakov <
<email address hidden>> wrote:

> Since you linked the bug I created about multi-homing you might know the
> workarounds already but I will summarize them just in case.
>
> One of the ways to workaround the problem is using source-based policy
> routing (as you mentioned) for receiving TCP traffic. For sending
> traffic static routes have to be used as the destination host has to be
> known to direct traffic to the right outbound hop.
>
> I suspect that using a charm to set policy rules can be a problem if the
> "first space" is not the one that needs to be used to contact the Juju
> controller from a machine/unit agent and the controller is not on the
> same L2 (in a different subnet) - this case could be quite relevant with
> L3 leaf-spine deployments with Juju HA enabled.
>
> Regardless of how this is applied (cloud-init or charm), the following
> could be used:
>
> 1) with TCP (even with using an unbound listening socket -
> 0.0.0.0/INADDR_ANY, see man 2 listen and man 7 ip), you can rely on the
> fact that a connected socket of your TCP server will use a source
> address that was specified as a destination address on a client. When
> the client creates its own socket (5-tuple) to establish a TCP
> connection, it does not expect a source address of a response packet to
> magically change. Unless there is a broken NAT configuration, the
> receiving host with the TCP server uses received_packet.destination_addr
> as connected_socket.source_addr.
>
> This allows you to avoid static routes and handle "unknown sender"
> scenarios correctly for receiving traffic with the following rules:
>
> CIDR=192.168.1.0/24 # e.g. if you have eth1 <-> 192.168.1.10
> ip route add default via $GATEWAY table $TABLE # add a default route to a
> different table
> # add a policy rule to use per-interface-subnet routing tables without
> hitting rp_filter by using asymmetric routing
> ip rule add from $CIDR table $TABLE priority $PRIORITY
>
> The trick is that a request will come to 192.168.1.10 from, say, 1.1.1.1
> and a response source address will be selected as 192.168.1.10. The TCP
> server's kernel will then inspect the response packet source address and
> forward it using a $GATEWAY in $TABLE. This might be counter-intuitive
> as the locally-generated response is a subject of a policy rule - not a
> request.
>
> A simple charm that could be used for that lives here (it can be improved
> to avoid hard-coding the interface):
> https://git.launchpad.net/~canonical-bootstack/charm-
> policy-routing/tree/hooks/config-changed
> https://jujucharms.com/u/canonical-bootstack/policy-routing
>
> 2) For UDP and unbound sockets (INADDR_ANY) the problem is that you only
> have one receiving (listening) socket and no connected socket. Your UDP
> server kernel figures out a source address to use during sendto(2)
> execution (getsockna...

Changing the default gateway seems very much a bandaid solution, as if you
have >1 interface, sort of by definition you want some traffic to go out
one interface for a reason different to other traffic. And changing the
'default' is going to be wrong for some portion of your traffic.

On Mon, Jul 16, 2018 at 2:40 AM, Dmitrii Shcherbakov <
1781856@bugs.launchpad.net> wrote:

> Since you linked the bug I created about multi-homing you might know the
> workarounds already but I will summarize them just in case.
>
> One of the ways to workaround the problem is using source-based policy
> routing (as you mentioned) for receiving TCP traffic. For sending
> traffic static routes have to be used as the destination host has to be
> known to direct traffic to the right outbound hop.
>
> I suspect that using a charm to set policy rules can be a problem if the
> "first space" is not the one that needs to be used to contact the Juju
> controller from a machine/unit agent and the controller is not on the
> same L2 (in a different subnet) - this case could be quite relevant with
> L3 leaf-spine deployments with Juju HA enabled.
>
> Regardless of how this is applied (cloud-init or charm), the following
> could be used:
>
> 1) with TCP (even with using an unbound listening socket -
> 0.0.0.0/INADDR_ANY, see man 2 listen and man 7 ip), you can rely on the
> fact that a connected socket of your TCP server will use a source
> address that was specified as a destination address on a client. When
> the client creates its own socket (5-tuple) to establish a TCP
> connection, it does not expect a source address of a response packet to
> magically change. Unless there is a broken NAT configuration, the
> receiving host with the TCP server uses received_packet.destination_addr
> as connected_socket.source_addr.
>
> This allows you to avoid static routes and handle "unknown sender"
> scenarios correctly for receiving traffic with the following rules:
>
> CIDR=192.168.1.0/24 # e.g. if you have eth1 <-> 192.168.1.10
> ip route add default via $GATEWAY table $TABLE # add a default route to a
> different table
> # add a policy rule to use per-interface-subnet routing tables without
> hitting rp_filter by using asymmetric routing
> ip rule add from $CIDR table $TABLE priority $PRIORITY
>
> The trick is that a request will come to 192.168.1.10 from, say, 1.1.1.1
> and a response source address will be selected as 192.168.1.10. The TCP
> server's kernel will then inspect the response packet source address and
> forward it using a $GATEWAY in $TABLE. This might be counter-intuitive
> as the locally-generated response is a subject of a policy rule - not a
> request.
>
> A simple charm that could be used for that lives here (it can be improved
> to avoid hard-coding the interface):
> https://git.launchpad.net/~canonical-bootstack/charm-
> policy-routing/tree/hooks/config-changed
> https://jujucharms.com/u/canonical-bootstack/policy-routing
>
> 2) For UDP and unbound sockets (INADDR_ANY) the problem is that you only
> have one receiving (listening) socket and no connected socket. Your UDP
> server kernel figures out a source address to use during sendto(2)
> execution (getsockname would get the result). This is nicely summarized
> here: http://laforge.gnumonks.org/blog/20171020-local_ip_unbound_udp/
>
> Fortunately, most of our workloads are TCP and we do not hit that
> problem that often. For OpenStack deployments designate-bind might be
> problematic in case multiple interfaces are used for its container.
>
> 3) For sending traffic either static routes or VRF + SO_BINDTODEVICE
> have to be used as you either have to know exactly how to route to a
> given end host or bind a socket used for sending to a certain interface
> associated with a routing table through a VRF.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1781856
>
> Title:
>   Default gateway for LXD containers cannot be influenced/changed
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1781856/+subscriptions
>

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2018-10-09:

#3

Since there a workaround, described in comment # 1 and jameinel does not think that this is a good solution to a problem, I am marking this report as Invalid for juju.

Changed in juju:
status:	New → Invalid

Revision history for this message

Trent Lloyd (lathiat) wrote on 2018-10-11:

#4

Anastasia: There is no workaround for LXD containers, there is a workaround for MAAS machines but not the relevant LXD containers. This is very much still an issue for LXD containers, the networking for which is controlled entirely by juju. So I think this Invalid state is incorrect.

While there are perhaps "better" solutions (using VNF etc) its still very relevant to want to change the default gateway, it is not always just a bandaid solution depending on the environment.

Revision history for this message

Trent Lloyd (lathiat) wrote on 2018-10-11:

#5

For clarity of example, we had a customer who required this. The only fix they had was to manually login to the machines and change the interfaces files after deployment.

Changed in juju:
status:	Invalid → New

Revision history for this message

Trent Lloyd (lathiat) wrote on 2018-10-11:

#6

Obvious fix to me would be to use the default space as the gateway, if it has one, otherwise fall back to finding a space that has a gateway.

Richard Harding (rharding) on 2018-10-23

Changed in juju:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 2.5.1

Ian Booth (wallyworld) on 2019-01-28

Changed in juju:
milestone:	2.5.1 → 2.5.2

Canonical Juju QA Bot (juju-qa-bot) on 2019-03-11

Changed in juju:
milestone:	2.5.2 → 2.5.3

Canonical Juju QA Bot (juju-qa-bot) on 2019-03-26

Changed in juju:
milestone:	2.5.3 → 2.5.4

Canonical Juju QA Bot (juju-qa-bot) on 2019-04-02

Changed in juju:
milestone:	2.5.4 → 2.5.5

Anastasia (anastasia-macmood) on 2019-05-14

Changed in juju:
milestone:	2.5.6 → 2.5.8

Canonical Juju QA Bot (juju-qa-bot) on 2019-06-28

Changed in juju:
milestone:	2.5.8 → 2.5.9

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2019-10-31:

#7

Removing from a milestone as this work will not be done in 2.5 series.

Changed in juju:
milestone:	2.5.9 → none

Revision history for this message

Diko Parvanov (dparv) wrote on 2021-01-27:

#8

Trent Lloyd (lathiat) wrote on 2018-10-11:
> Obvious fix to me would be to use the default space as the gateway, if it has one, otherwise fall > back to finding a space that has a gateway.

+1 for this, faced same issue where an application (designate-bind) has bindings to two MAAS subnets, both defined with default gateways and first LXD multiple default gateways in ip r l, second the active one went through dns-frontend space binding, instead of the default space binding.

Canonical Juju

Default gateway for LXD containers cannot be influenced/changed

Bug Description

Other bug subscribers

Remote bug watches