cloud-init overriding set-name in netplan file
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Expired
|
Medium
|
Unassigned |
Bug Description
After creating an Ubuntu 22.04 instance in OpenStack the following netplan file is generated:
```
# cat /etc/netplan/
# This file is generated from information provided by the datasource. Changes
# to it will not persist across an instance reboot. To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/
# network: {config: disabled}
network:
version: 2
ethernets:
ens3:
dhcp4: true
dhcp6: true
match:
mtu: 1500
```
With the matching links:
```
# ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,
ens3 UP fa:16:3e:c7:f9:7e <BROADCAST,
```
I was then trying to rename the interface from "ens3" to "eth0", updating the file like so:
```
# cat /etc/netplan/
# This file is generated from information provided by the datasource. Changes
# to it will not persist across an instance reboot. To disable cloud-init's
# network configuration capabilities, write a file
# /etc/cloud/
# network: {config: disabled}
network:
version: 2
ethernets:
eth0:
dhcp4: true
dhcp6: true
match:
mtu: 1500
```
Applying the config works, the interface is renamed without dropping my SSH connection:
```
# netplan apply
# ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,
eth0 UP fa:16:3e:c7:f9:7e <BROADCAST,
```
So far so good, but now I reboot the machine, and it will not come back online:
```
# reboot
Connection to XXX.XXX.XXX.XXX closed by remote host.
Connection to XXX.XXX.XXX.XXX closed.
```
Logging in via a locally connected console I can see the following:
```
# ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,
ens3 DOWN fa:16:3e:c7:f9:7e <BROADCAST,
```
So for some reason the interface comes up as "ens3" again, also it has no address configuration assigned which is the reason I can not reach it. If I then run a manual "netplan apply" I can get it online again:
```
# netplan apply
# ip -br l
lo UNKNOWN 00:00:00:00:00:00 <LOOPBACK,
eth0 UP fa:16:3e:c7:f9:7e <BROADCAST,
```
Now logged in over SSH again checking the dmesg log for renames the following can be seen:
```
# dmesg | grep rename
[ 2.142770] virtio_net virtio0 ens3: renamed from eth0
[ 6.089816] virtio_net virtio0 eth0: renamed from ens3
[ 7.253661] virtio_net virtio0 ens3: renamed from eth0
[ 278.607558] virtio_net virtio0 eth0: renamed from ens3
```
So the network name has been flapping back and forth between "ens3" and "eth0".
After digging around I think this is what happens:
```
[ 2.142770] virtio_net virtio0 ens3: renamed from eth0 <- systemd-networkd, as part of initramfs
[ 6.089816] virtio_net virtio0 eth0: renamed from ens3 <- systemd-networkd, as part of booted OS, using the files generated by my initial "netplan apply".
[ 7.253661] virtio_net virtio0 ens3: renamed from eth0 <- cloud-init, for some reason
[ 278.607558] virtio_net virtio0 eth0: renamed from ens3 <- my manual "netplan apply" after logging in to the console
```
Looking at /var/log/
```
2023-02-06 07:57:27,270 - __init__.py[DEBUG]: Detected interfaces {'eth0': {'downable': True, 'device_id': '0x0001', 'driver': 'virtio_net', 'mac': 'fa:16:
2023-02-06 07:57:27,270 - __init__.py[DEBUG]: achieving renaming of [['fa:16:
2023-02-06 07:57:27,270 - subp.py[DEBUG]: Running command ['ip', 'link', 'set', 'eth0', 'name', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
```
I had a hard time understanding how cloud-init knew about the previous "ens3" name initially, but now I think this has been persisted in the obj.pkl at initial install time boot and is now picked up on subsequent boots, from that same log:
```
2023-02-06 07:57:27,211 - util.py[DEBUG]: Reading from /var/lib/
```
Taking a look in the file:
```
# cat p.py
#!/usr/bin/env python3
import pickle
# open a file, where you stored the pickled data
with open('/
data = pickle.load(file)
print(data.
```
```
# ./p.py
{'version': 1, 'config': [{'mtu': 1500, 'type': 'physical', 'accept-ra': True, 'subnets': [{'type': 'dhcp4'}, {'type': 'dhcp6'}], 'mac_address': 'fa:16:
```
From what I can tell this "name" is picked up in the openstack helper at https:/
So... the question then is, how should this work? Right now it seems cloud-init is helping me with a rename even if I have asked the netplan file to set another name than the machine had at initial install.
One thing that occured to me is that maybe I am expected to feed cloud-init user-data so it can know initially that I want the interface called "eth0", but reading https:/
For now I guess the simplest workaround is to just disable the network management parts as mentioned in the generated netplan file, this works:
```
# echo "network: {config: disabled}" > /etc/cloud/
# reboot
```
Now the machine comes up by itself, and there are less renames happening:
```
# dmesg | grep rename
[ 2.165152] virtio_net virtio0 ens3: renamed from eth0
[ 6.108291] virtio_net virtio0 eth0: renamed from ens3
```
It feels strange to have to disable the network management parts... What would be the correct way to deal with this situation?
Thanks for filing this bug and helping make cloud-init better. Let's see if we can get to the root of the problem.
This may involve us requesting your attached logs from running `cloud-init collect-logs` and attaching the corresponding tar file.
Please check that tarfile instance- data-sensitive. json before attaching because it could contain sensitive information if you provided passwords or user-credentials in user-data on the affected VM.
Minimally I think we need to see the output of journalctl -b 0 -o short-precise and the full cloud-init.log. (which are both grabbed by cloud-init collect-logs anyway).
Generally, I don't think the OpenStack datasource default behavior should be for cloud-init to be actively rewriting or re-applying network config across reboot. It generally should be inert unless the datasource IMDS (instance metadata) either changes the instance-id in meta-data to a new UUID (telling cloud-init it needs to reconfigure the world) or if OpenStack was configured to re-render network per-boot.
So, we might have a bug that LinuxNetworking .apply_ network_ config_ names is running more often than it should across normal system reboots even when the DataSourceOpenStack hasn't told cloud-init to re-render and re-apply new networking config due to BOOT_NEW_INSTANCE event.
I would have expected cloud-init to exit and do nothing with network renames across normal reboots due to these checks /github. com/canonical/ cloud-init/ blob/main/ cloudinit/ stages. py#L905- L916
https:/
I think it will help to see full cloud-init.log here to surmise what really has happened with all the PER_BOOT, PER_INSTANCE_ REBOOT, datasource cache validation, instance-id and event checks. So we can better determine why cloud-init thinks it should be touching anything w/ network renames across subsequent boots.
I'll set this to 'incomplete' status above, but please set it back to 'new' status when you get a chance to attach logs.