Juju reports incorrect ingress-address/private-address

Bug #1933303 reported by Simon Fels
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
High
Joseph Phillips

Bug Description

I have a LXD container which hosts a nested LXD deployed by a charm. A such the nested LXD creates a network bridge called amsbr0 to serve nested containers. The bridge has 192.168.100.1 assigned as it's IP address and serves addresses via DHCP to nested containers from the 192.168.100.1/24 subnet. This looks as follows in the LXD running on the host:

$ lxc ls
+---------------+---------+------------------------+------+-----------+-----------+----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION |
+---------------+---------+------------------------+------+-----------+-----------+----------+
| juju-5ae9e7-0 | RUNNING | 172.2.0.107 (eth0) | | CONTAINER | 0 | lxd0 |
+---------------+---------+------------------------+------+-----------+-----------+----------+
| juju-cef40f-0 | RUNNING | 172.2.0.88 (eth0) | | CONTAINER | 0 | lxd0 |
+---------------+---------+------------------------+------+-----------+-----------+----------+
| juju-cef40f-1 | RUNNING | 172.2.0.154 (eth0) | | CONTAINER | 0 | lxd0 |
+---------------+---------+------------------------+------+-----------+-----------+----------+
| juju-cef40f-2 | RUNNING | 192.168.100.1 (amsbr0) | | CONTAINER | 0 | lxd0 |
| | | 172.2.0.15 (eth0) | | | | |
+---------------+---------+------------------------+------+-----------+-----------+----------+

The LXD container hosting the nested LXD is juju-cef40f-2 and the other containers can be ignored here.

Asking the lxd/0 unit which is deployed into the juju-cef40f-2 container on the host gives an incorrect address for the units private-address

$ juju run -u lxd/0 -- unit-get private-address
192.168.100.1

The expected private address is 172.2.0.15 rather than 192.168.100.1 which is not accessible from outside of the container.

This then breaks our charm in some conditions when the nested bridge is already gone but Juju hasn't yet updated it's view of what the private-address should be. Our charm tries to configure the private-address for the core.https_address of the nested LXD instance which then fails as thea msbr0 bridge netdev is already gone.

The problem seems to rely in the logic Juju has to determine the private-address. Looking at the relevant code in Juju shows that it only ignores bridge devices from the LXD container status which are named lxdbr0 or lxcbr0 (see https://github.com/juju/juju/blob/develop/container/lxd/container.go#L235) which is breaking things here. Instead of hardcoding the possible bridge names I think Juju should look at the network state reported by the container/vm and compare it against the container configuration and only consider network devices listed there. That should safely tell us which network devices the container uses for incoming/outgoing traffic vs those it just uses on the inside.

This is all on Juju 2.8.11. I haven't checked 2.9.x yet but from the code the problem should remain there too.

Revision history for this message
Simon Fels (morphis) wrote :

I am not that familiar with the Juju code base and there might be other parts which need adjustments but https://paste.ubuntu.com/p/wGNKVcS5nH/ shows what I think should solve the problem.

description: updated
description: updated
Revision history for this message
Joseph Phillips (manadart) wrote :

The 2.9 series will allow you to get the correct behaviour from your model by:
- Defining a space that includes the 172.2.0.0/24 subnet.
- Binding the deployed unit to this space.

I'll mark incomplete for now, happy to have it reopened at need.

Changed in juju:
importance: Undecided → High
status: New → Incomplete
assignee: nobody → Joseph Phillips (manadart)
Revision history for this message
Simon Fels (morphis) wrote :

I agree using spaces + bindings can help here but it still doesn't solve the underlying bug. Juju considering a network device for the private-address which is never reachable from the outside but special casing lxdbr0/lxcbr0 doesn't really make sense to me. From a UX perspective that's something a user doesn't really understand.

Revision history for this message
Leon (sed-i) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.