Comment 8 for bug 1636708

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1636708] Re: ifup -a does not start dependants last, causes deadlocks with vlans/bonding

On Tue, May 9, 2017 at 10:32 AM, Dimitri John Ledkov <<email address hidden>
> wrote:

> Maybe this is easier to read.
>
> root@xnox-iad-nr5:~# journalctl -o short-monotonic -u <email address hidden>
> -- Logs begin at Tue 2017-05-09 10:57:18 UTC, end at Tue 2017-05-09
> 15:22:34 UTC. --
> [ 6.740201] xnox-iad-nr5 systemd[1]: Started ifup for bond0.
> [ 6.750333] xnox-iad-nr5 sh[1184]: Waiting for a slave to join bond0
> (will timeout after 60s)
> [ 6.987241] xnox-iad-nr5 sh[1184]: ifquery: recursion detected for
> interface bond0 in pre-up phase
> [ 6.987341] xnox-iad-nr5 sh[1184]: ifquery: recursion detected for
> parent interface bond0 in pre-up phase
> [ 6.987425] xnox-iad-nr5 sh[1184]: ifquery: recursion detected for
> parent interface bond0 in pre-up phase
> root@xnox-iad-nr5:~# journalctl -o short-monotonic -u
> <email address hidden>
> -- Logs begin at Tue 2017-05-09 10:57:18 UTC, end at Tue 2017-05-09
> 15:22:34 UTC. --
> [ 6.755723] xnox-iad-nr5 systemd[1]: Started ifup for bond0.101.
> [ 6.757056] xnox-iad-nr5 sh[1286]: ifup: waiting for lock on
> /run/network/ifstate.bond0
> [ 7.227572] xnox-iad-nr5 sh[1286]: Set name-type for VLAN subsystem.
> Should be visible in /proc/net/vlan/config
> root@xnox-iad-nr5:~# journalctl -o short-monotonic -u
> <email address hidden>
> -- Logs begin at Tue 2017-05-09 10:57:18 UTC, end at Tue 2017-05-09
> 15:22:34 UTC. --
> [ 6.760568] xnox-iad-nr5 systemd[1]: Started ifup for bond0.401.
> [ 6.761920] xnox-iad-nr5 sh[1290]: ifup: waiting for lock on
> /run/network/ifstate.bond0
> [ 7.197983] xnox-iad-nr5 sh[1290]: Set name-type for VLAN subsystem.
> Should be visible in /proc/net/vlan/config
> root@xnox-iad-nr5:~# journalctl -o short-monotonic -u networking.service
> -- Logs begin at Tue 2017-05-09 10:57:18 UTC, end at Tue 2017-05-09
> 15:22:34 UTC. --
> [ 6.645323] xnox-iad-nr5 systemd[1]: Starting Raise network
> interfaces...
> [ 6.692530] xnox-iad-nr5 ifup[992]: Waiting for bonding kernel module
> to be ready (will timeout after 5s)
> [ 6.693104] xnox-iad-nr5 ifup[992]: Waiting for bond master bond0 to be
> ready
> [ 7.221867] xnox-iad-nr5 ifup[992]: /sbin/ifup: waiting for lock on
> /run/network/ifstate.bond0
> [ 7.263179] xnox-iad-nr5 systemd[1]: Started Raise network interfaces.
>
>
What is the status/timestamp of the physical devices (enspf90 and 91) ?

 Note that networking, <email address hidden>, and <email address hidden> all hit locks
that are held by bond0.

> Note that <email address hidden> beats networking.service to acquire the lock.
>

hrm, I didn't think we got ifup events for non-physical devices; this is
certainly problematic;
the design, IIUC, is for physical devices to get ifup@.service triggers via
udev events; see the the BindsTo=sys-subsystem-net-devices-%i.device

The result in this boot is that networking.service (which starts at 6.64
seconds calls ifup -a (pid 992) which is racing with
the ifup@bond0 (started at 6.74, pid 1184); so we have *two* ifups locking
bond0; I think this is the real deadlock.

I'm not sure why we're getting an ifup@bond0 unit execution, the udev
scripts are only triggered for physical devices;

Looks like when the module is loaded (modprobe bonding) that we get an
event for bond0 ... which may be triggering
the hotplug udev-rule which queues a ifup@bond0 job...

This is strange (to me):

# modprobe -vr bonding
rmmod bonding
root@x1:~# ifconfig -a
ens3 Link encap:Ethernet HWaddr 52:54:00:87:54:74
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:2 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:104 (104.0 B) TX bytes:184 (184.0 B)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:320 errors:0 dropped:0 overruns:0 frame:0
          TX packets:320 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:23680 (23.6 KB) TX bytes:23680 (23.6 KB)

root@x1:~# cat /etc/issue
Ubuntu 16.04.2 LTS \n \l

root@x1:~# lsb_release -rd
Description: Ubuntu 16.04.2 LTS
Release: 16.04
root@x1:~# ls -al /sys/class/net/
total 0
drwxr-xr-x 2 root root 0 May 9 16:36 .
drwxr-xr-x 63 root root 0 May 9 16:35 ..
lrwxrwxrwx 1 root root 0 May 9 16:35 ens3 ->
../../devices/pci0000:00/0000:00:03.0/virtio0/net/ens3
lrwxrwxrwx 1 root root 0 May 9 16:35 lo -> ../../devices/virtual/net/lo

root@x1:~# modprobe -v bonding
insmod /lib/modules/4.4.0-75-generic/kernel/drivers/net/bonding/bonding.ko
root@x1:~# ls -al /sys/class/net/
total 0
drwxr-xr-x 2 root root 0 May 9 16:36 .
drwxr-xr-x 63 root root 0 May 9 16:36 ..
lrwxrwxrwx 1 root root 0 May 9 16:36 bond0 ->
../../devices/virtual/net/bond0
-rw-r--r-- 1 root root 4096 May 9 16:36 bonding_masters
lrwxrwxrwx 1 root root 0 May 9 16:36 ens3 ->
../../devices/pci0000:00/0000:00:03.0/virtio0/net/ens3
lrwxrwxrwx 1 root root 0 May 9 16:36 lo -> ../../devices/virtual/net/lo
root@x1:~# uname -r
4.4.0-75-generic

Why would bond0 get created when loading the bonding module?

@xnox, if you change your interfaces from bond0, to bond1 globally, does
your configuration
work ?

> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1636708
>
> Title:
> ifup -a does not start dependants last, causes deadlocks with
> vlans/bonding
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/
> 1636708/+subscriptions
>