Bundle deploys fail at lxc-start when bridge br-eth1 is created
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
Critical
|
Andrew McDermott |
Bug Description
As seen in
http://
Juju CI MAAS 1.9 provider tests provision to machines that are managed by MAAS over their second NIC.
Native Juju bundle deployment fails to start the LXC template, with the message 'The container failed to start.'
See http://
Manually logging into the machine and capturing lxc-start debug output shows that br-eth0 is not found.
If the machine is reconfigured, so MAAS manages the machine using the first NIC, lxc-start succeeds and the bundle deployment completes.
lxc-start --name juju-trusty-
lxc-start 1456342861.218 INFO lxc_start_ui - lxc_start.
lxc-start 1456342861.220 WARN lxc_log - log.c:lxc_
lxc-start 1456342861.221 WARN lxc_cgmanager - cgmanager.
lxc-start 1456342861.221 INFO lxc_lsm - lsm/lsm.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 WARN lxc_seccomp - seccomp.
lxc-start 1456342861.221 WARN lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 WARN lxc_seccomp - seccomp.
lxc-start 1456342861.221 WARN lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.221 INFO lxc_seccomp - seccomp.
lxc-start 1456342861.222 DEBUG lxc_conf - conf.c:
lxc-start 1456342861.222 DEBUG lxc_conf - conf.c:
lxc-start 1456342861.222 DEBUG lxc_conf - conf.c:
lxc-start 1456342861.222 DEBUG lxc_conf - conf.c:
lxc-start 1456342861.222 INFO lxc_conf - conf.c:
lxc-start 1456342861.222 DEBUG lxc_start - start.c:
lxc-start 1456342861.222 DEBUG lxc_console - console.
lxc-start 1456342861.222 DEBUG lxc_console - console.
lxc-start 1456342861.222 DEBUG lxc_console - console.
lxc-start 1456342861.222 DEBUG lxc_console - console.
lxc-start 1456342861.222 INFO lxc_start - start.c:
lxc-start 1456342861.223 DEBUG lxc_start - start.c:
lxc-start 1456342861.248 ERROR lxc_conf - conf.c:
lxc-start 1456342861.255 ERROR lxc_conf - conf.c:
lxc-start 1456342861.255 ERROR lxc_start - start.c:
lxc-start 1456342861.255 ERROR lxc_start - start.c:
lxc-start 1456342861.255 ERROR lxc_start_ui - lxc_start.
lxc-start 1456342861.255 ERROR lxc_start_ui - lxc_start.
Just switching the NIC order at the machine is not enough, MAAS 1.9 needs to re-commission the node so it re-associates the device name. The following /sys/class/net entries are from after switching the NICs but before re-commissioning the node; note eth1 is still associated with the device in pci slot 3. After re-commissioning, eth0 is associated with the device at slot 3 and eth1 with the device at slot 7.
root@maas-
total 0
lrwxrwxrwx 1 root root 0 Feb 24 20:39 br-eth1 -> ../../devices/
lrwxrwxrwx 1 root root 0 Feb 24 20:39 eth0 -> ../../devices/
lrwxrwxrwx 1 root root 0 Feb 24 20:38 eth1 -> ../../devices/
lrwxrwxrwx 1 root root 0 Feb 24 20:38 lo -> ../../devices/
lrwxrwxrwx 1 root root 0 Feb 24 20:44 lxcbr0 -> ../../devices/
Changed in juju-core: | |
importance: | Undecided → High |
tags: | added: ci deploy maas-provider test-failure |
Changed in juju-core: | |
status: | New → In Progress |
assignee: | nobody → Dimiter Naydenov (dimitern) |
milestone: | none → 2.0-beta2 |
Changed in juju-core: | |
importance: | High → Critical |
description: | updated |
Changed in juju-core: | |
assignee: | nobody → Dimiter Naydenov (dimitern) |
Changed in juju-core: | |
status: | Triaged → In Progress |
Changed in juju-core: | |
status: | In Progress → Fix Committed |
Changed in juju-core: | |
status: | Fix Committed → Fix Released |
affects: | juju-core → juju |
Changed in juju: | |
milestone: | 2.0-beta2 → none |
milestone: | none → 2.0-beta2 |
I think I know why this issue happens. DefaultBridgeNa me - that value should be removed) as a bridge device for all container NICs in container. BridgeNetworkCo nfig(), but instead use the instancecfg. DefaultBridgePr efix to build correct lxc.network.link setting of lxc.conf.
Since we introduced the multi-bridge script for the MAAS provider, we shouldn't just use the "br-eth0" (instancecfg.