juju 2.1 beta5 can reset networking while hooks are running
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Autopilot Log Analyser |
Fix Committed
|
High
|
Francis Ginther | ||
Canonical Juju |
Fix Released
|
Critical
|
John A Meinel | ||
2.1 |
Fix Released
|
Critical
|
John A Meinel | ||
Landscape Server |
Fix Released
|
Undecided
|
Francis Ginther |
Bug Description
This was found while using juju 2.1 beta5 from ppa:juju/devel and testing landscape autopilot in CI. These logs are from https:/
Landscape autopilot deploys some openstack services in containers, while some run on the host machine itself. We are seeing install hook failures on the host machine due to networking being disabled during hook execution:
[landscape-
2017-02-07 10:43:26 INFO install dpkg-query: package 'python-netaddr' is not installed and no information is available
2017-02-07 10:43:26 INFO install Use dpkg --info (= dpkg-deb --info) to examine archive files,
2017-02-07 10:43:26 INFO install and dpkg --contents (= dpkg-deb --contents) to list their contents.
2017-02-07 10:43:27 INFO install Reading package lists...
2017-02-07 10:43:27 INFO install Building dependency tree...
2017-02-07 10:43:27 INFO install Reading state information...
2017-02-07 10:43:27 INFO install The following additional packages will be installed:
2017-02-07 10:43:27 INFO install ieee-data
2017-02-07 10:43:27 INFO install Suggested packages:
2017-02-07 10:43:27 INFO install ipython python-netaddr-docs
2017-02-07 10:43:27 INFO install The following NEW packages will be installed:
2017-02-07 10:43:27 INFO install ieee-data python-netaddr
2017-02-07 10:43:27 INFO install 0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
2017-02-07 10:43:27 INFO install Need to get 1004 kB of archives.
2017-02-07 10:43:27 INFO install After this operation, 5239 kB of additional disk space will be used.
2017-02-07 10:43:27 INFO install Err:1 http://
2017-02-07 10:43:27 INFO install Cannot initiate the connection to 10.96.0.10:8000 (10.96.0.10). - connect (101: Network is unreachable)
2017-02-07 10:43:27 INFO install Err:2 http://
2017-02-07 10:43:27 INFO install Cannot initiate the connection to 10.96.0.10:8000 (10.96.0.10). - connect (101: Network is unreachable)
2017-02-07 10:43:27 INFO install E: Failed to fetch http://
2017-02-07 10:43:27 INFO install
2017-02-07 10:43:27 INFO install E: Failed to fetch http://
2017-02-07 10:43:27 INFO install
2017-02-07 10:43:27 INFO install E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
2017-02-07 10:43:27 INFO install dpkg-query: package 'python-netifaces' is not installed and no information is available
2017-02-07 10:43:27 INFO install Use dpkg --info (= dpkg-deb --info) to examine archive files,
2017-02-07 10:43:27 INFO install and dpkg --contents (= dpkg-deb --contents) to list their contents.
2017-02-07 10:43:27 INFO install Reading package lists...
2017-02-07 10:43:27 INFO install Building dependency tree...
2017-02-07 10:43:27 INFO install Reading state information...
2017-02-07 10:43:27 INFO install The following NEW packages will be installed:
2017-02-07 10:43:27 INFO install python-netifaces
2017-02-07 10:43:27 INFO install 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
2017-02-07 10:43:27 INFO install Need to get 16.7 kB of archives.
2017-02-07 10:43:27 INFO install After this operation, 52.2 kB of additional disk space will be used.
2017-02-07 10:43:27 INFO install Err:1 http://
2017-02-07 10:43:27 INFO install Cannot initiate the connection to 10.96.0.10:8000 (10.96.0.10). - connect (101: Network is unreachable)
2017-02-07 10:43:27 INFO install E: Failed to fetch http://
2017-02-07 10:43:27 INFO install
2017-02-07 10:43:27 INFO install E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
These timestamps match with messages in syslog indicating that eth0 is being set to promiscuous mode:
[landscape-
Feb 7 10:43:21 elkhart systemd[1]: Started LXD - main daemon.
Feb 7 10:43:22 elkhart ntpd[2576]: Listen normally on 10 lxdbr0 10.0.0.1:123
Feb 7 10:43:22 elkhart ntpd[2576]: Listen normally on 11 lxdbr0 [fe80::
Feb 7 10:43:22 elkhart ntpd[2576]: Deleting interface #8 lxdbr0, fe80::4428:
Feb 7 10:43:22 elkhart ntpd[2576]: Deleting interface #9 lxdbr0, fe80::1%6#123, interface stats: received=0, sent=0, dropped=0, active_time=18 secs
Feb 7 10:43:22 elkhart ntpd[2576]: new interface(s) found: waking up resolver
Feb 7 10:43:24 elkhart systemd-
Feb 7 10:43:25 elkhart kernel: [ 68.806766] IPv6: ADDRCONF(
Feb 7 10:43:25 elkhart systemd[1]: Reloading OpenBSD Secure Shell server.
Feb 7 10:43:25 elkhart systemd[1]: Reloaded OpenBSD Secure Shell server.
Feb 7 10:43:25 elkhart systemd-
Feb 7 10:43:25 elkhart kernel: [ 68.891670] device eth0 entered promiscuous mode
Feb 7 10:43:25 elkhart kernel: [ 68.894098] IPv6: ADDRCONF(
Feb 7 10:43:25 elkhart systemd[1]: Started ifup for br-eth0.
Feb 7 10:43:25 elkhart sh[5069]: ifup: waiting for lock on /run/network/
Feb 7 10:43:25 elkhart systemd[1]: Found device /sys/subsystem/
Feb 7 10:43:27 elkhart ntpd[2576]: Deleting interface #3 eth0, 10.96.60.114#123, interface stats: received=0, sent=5, dropped=0, active_time=42 secs
Feb 7 10:43:27 elkhart ntpd[2576]: Deleting interface #5 eth0, fe80::2e59:
Feb 7 10:43:29 elkhart kernel: [ 72.746818] igb 0000:02:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Feb 7 10:43:29 elkhart kernel: [ 72.746968] br-eth0: port 1(eth0) entered forwarding state
Feb 7 10:43:29 elkhart kernel: [ 72.746988] br-eth0: port 1(eth0) entered forwarding state
Feb 7 10:43:29 elkhart kernel: [ 72.747149] IPv6: ADDRCONF(
Feb 7 10:43:29 elkhart systemd[1]: Reloading OpenBSD Secure Shell server.
Feb 7 10:43:29 elkhart systemd[1]: Reloaded OpenBSD Secure Shell server.
Feb 7 10:43:29 elkhart sh[5069]: ifup: interface br-eth0 already configured
Feb 7 10:43:30 elkhart ntpd[2576]: Listen normally on 12 br-eth0 10.96.60.114:123
In a brief hangout discussion with Aaron Bentley, it sounded like the hook execution and network changes should run serially and not cause the install hook to fail.
tags: | added: uosci |
Changed in landscape: | |
milestone: | none → 17.01 |
status: | New → Confirmed |
tags: | added: cdo-qa-blocker |
Changed in juju: | |
status: | New → Triaged |
assignee: | nobody → John A Meinel (jameinel) |
importance: | Undecided → Critical |
milestone: | none → 2.1-rc1 |
Changed in autopilot-log-analyser: | |
assignee: | nobody → Francis Ginther (fginther) |
Changed in landscape: | |
assignee: | nobody → Francis Ginther (fginther) |
Changed in autopilot-log-analyser: | |
importance: | Undecided → High |
Changed in autopilot-log-analyser: | |
status: | New → In Progress |
Changed in juju: | |
milestone: | 2.1-rc1 → 2.2.0-alpha1 |
Changed in autopilot-log-analyser: | |
status: | In Progress → Fix Committed |
Changed in landscape: | |
milestone: | 17.01 → 17.02 |
Changed in juju: | |
status: | Triaged → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
Changed in landscape: | |
status: | Fix Committed → Fix Released |
https:/ /github. com/juju/ juju/pull/ 6944