Comment 24 for bug 1902960

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the explanation, Dan! I was off down a wrong path, I appreciate the correction.

I've just downloaded the Azure image from cloud-images.u.c and it includes this in `/etc/netplan/90-hotplug-azure.yaml`:

# This netplan yaml is delivered in Azure cloud images to support
# attaching and detaching nics after the instance first boot.
# Cloud-init otherwise handles initial boot network configuration in
# /etc/netplan/50-cloud-init.yaml
network:
    version: 2
    ethernets:
        ephemeral:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: '!eth0'
            optional: true
        hotpluggedeth0:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: 'eth0'

This file is not present in a booted system, because cloud-init removes it during boot:

2020-11-09 18:12:09,306 - handlers.py[DEBUG]: start: azure-ds/maybe_remove_ubuntu_network_config_scripts: maybe_remove_ubuntu_network_config_scripts
2020-11-09 18:12:09,307 - DataSourceAzure.py[INFO]: Removing Ubuntu extended network scripts because cloud-init updates Azure network configuration on the following event: System boot.
2020-11-09 18:12:09,307 - util.py[DEBUG]: Attempting to remove /etc/netplan/90-hotplug-azure.yaml
2020-11-09 18:12:09,307 - handlers.py[DEBUG]: finish: azure-ds/maybe_remove_ubuntu_network_config_scripts: SUCCESS: maybe_remove_ubuntu_network_config_scripts

It does this before the regular cloud-init network configuration is written, or `netplan generate` is called:

2020-11-09 18:12:09,465 - util.py[DEBUG]: Writing to /etc/netplan/50-cloud-init.yaml - wb: [644] 603 bytes
2020-11-09 18:12:09,466 - subp.py[DEBUG]: Running command ['netplan', 'generate'] with allowed return codes [0] (shell=False, capture=True)

cloud-init also runs a couple of udevadm commands right after `netplan generate`:

2020-11-09 18:12:09,813 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/eth0'] with allowed return codes [0] (shell=False, capture=True)
2020-11-09 18:12:09,828 - subp.py[DEBUG]: Running command ['udevadm', 'test-builtin', 'net_setup_link', '/sys/class/net/lo'] with allowed return codes [0] (shell=False, capture=True)

This all happens before systemd-networkd starts:

Nov 09 18:12:09.956027 focal-1604945439 systemd[1]: Starting Network Service...

So: I'm not really sure what's going on here. I've tried restoring `90-hotplug-azure.yaml` and removing `50-cloud-init.yaml`; that doesn't cause the issue to reproduce on a subsequent boot.

One thing worth noting, that could lead to unexpected state: cloud-init performs a DHCP on this interface (in order to be able to fetch the network configuration it is going to apply). It does this in a sandbox (i.e. it doesn't use system configuration for it), but potentially that could mean that there's (kernel?) state for that interface which {udev,network}d interpret in a way that leads to this issue?