Comment 6 for bug 1991552

Revision history for this message
Trent Lloyd (lathiat) wrote :

(Not an LXD expert, I'm from Sustaining Engineering, but reading the thread got me a little curious about how things work so some details from a small adventure I went trying to look at it)

Working from https://oil-jenkins.canonical.com/artifacts/6d96629c-3db5-442f-a603-b7f0e08b1d1b/generated/generated/openstack/juju-crashdump-openstack-2022-09-30-22.52.15.tar.gz - in this case the broken openstack-dashboard/0 was machine 3/lxd/9

The weird MAC address is key. 00:16:3e is the range LXD auto-generates it's MACs from, other MACs such as the 3e:33:82:f5:f7:8a you mentioned and the 9e:d6:cb:1d:3e:7a we see in this dump is from the 'locally administered' range has the 2 LSB set - any address matching x[26AE]:xx:xx:xx:xx:xx. This range is used by most other things that auto-generate an address including the kernel and systemd. For example when you first create a veth pair (which is how an LXD is connected to a bridge), the kernel assigns it a MAC from this range.

We arrive here at a container where eth0 has the wrong MAC. We can see the "ip link" output in 3/lxd/9/var/log/syslog

We get:
```
43: eth0@if44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 9e:d6:cb:1d:3e:7a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::9cd6:cbff:fe1d:3e7a/64 scope link
45: eth1@if46: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:d9:7d:9a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.246.173.5/22 brd 10.246.175.255 scope global eth1
47: eth2@if48: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:8e:ea:b8 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.246.166.209/22 brd 10.246.167.255 scope global eth2
```

What we expected can be found in 3/lxd/9/etc/netplan/99-juju.yaml

```
eth0 00:16:3e:dd:d2:fd 10.246.169.7/22
eth1 00:16:3e:d9:7d:9a 10.246.173.5/22
eth2 00:16:3e:8e:ea:b8 10.246.166.209/22
```

This incorrect MAC address explains why the IP is not set. Because the juju netplan configuration uses a "match: macaddress" segment. Although the interface has the correct name, if the MAC address doesn't match then netplan/systemd-networkd won't add the IP address.

So the question is how we arrive at an eth0 with the wrong MAC address.

Looking at the LXD bridged network setup code here: https://github.com/lxc/lxd/blob/09a226043e705369973596440405aa94203a00cf/lxd/device/device_utils_network.go#L229

It creates a veth pair, and then after creation, sets the MAC address. So it's possible we got stuck with the default kernel generated one and that LXD failed to apply it's generated MAC for some reason.

While I haven’t done an exhaustive understanding/search, the code paths here generally seem to fail and teardown the interface if setting the MAC fails and my very rough guesstimation of which veth interface it was (there is no way from juju-crash dump to know which interface is which for sure, but I made a good guessed based on a bunch of timing/etc) is that the interface did come up and get added to the bridge (which all happens later after the MAC is set). So it seems to be most likely LXD succeeds in at least attempting to set the MAC address. I don’t see any other error about setting a MAC in the kernel logs etc.

The other main possibility that comes to mind is something else (e.g. systemd-networkd/systemd-udevd) changing the MAC address after LXD did, as a race condition. Systemd v249 does seem to “queue” the request to set the MAC into an internal queue that may get delayed (and maybe processing the events was delayed).

This quick bpftrace script does seem to show the MAC of veth devices getting set both by systemd-udevd and by “ip link” launched from LXD rapidly one after the other. So there maybe is scope for these two events to race and go backwards and set the MAC back to what it started as. Though it’s not supposed to set the MAC unless it doesn’t match so I am not sure if my analysis here is correct but it’s probably a starting point for further work to debug the issue.

# cat setmac.bt
kprobe:eth_commit_mac_addr_change,
kprobe:eth_mac_addr
{
  $mac = (struct sockaddr *) arg1;

  printf("%s %d %s %s %r %s %s\n",
         func,
         pid,
         comm,
         ((struct net_device *)arg0)->name,
         buf($mac->sa_data, 6),
         ustack,
         kstack);
}

Generally in the logs it is quite a busy time, services are being installed locally, OVN is getting installed, etc.. so if a race condition was to happen it wouldn’t be a huge surprise.

I ran out of time to look at it further but thought I’d share my findings as a starting point for someone else.

I think “ip monitor” may be able to prove a race condition re-setting the MAC. If I run “ip monitor” and do an lxc start I see:

<peer veth device created with initial MAC>
127: veth4b365727@NONE: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether 6a:ca:6c:46:c6:85 brd ff:ff:ff:ff:ff:ff

<MAC change from LXD’s “ip link” call>

127: veth4b365727@veth10e0ce61: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether 00:16:3e:21:aa:b6 brd ff:ff:ff:ff:ff:ff

<then it gets move into the container namespace>

Deleted inet veth4b365727
Deleted inet6 veth4b365727
Deleted 127: veth4b365727@veth10e0ce61: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
    link/ether 00:16:3e:21:aa:b6 brd ff:ff:ff:ff:ff:ff new-netnsid 1 new-ifindex 127

So if you can get “ip monitor” launched and logged with timestamps during all of the test runs maybe you can prove this possibility.