Activity log for bug #1738768

Date Who What changed Old value New value Message
2017-12-18 11:28:52 Daniel Alvarez bug added bug
2017-12-18 11:33:00 Daniel Alvarez description I have deployed a 3 controllers - 3 computes HA environment with ML2/OVS and observed dataplane downtime when restarting/stopping neutron-l3 container on controllers. This is what I did: 1. Created a network, subnet, router, a VM and attached a FIP to the VIM 2. Left a ping running on the undercloud to the FIP 3. Stopped l3 container in controller-0. Result: Observed some packet loss while the router was failed over to controller-1 4. Stopped l3 container in controller-1 Result: Observed some packet loss while the router was failed over to controller-2 5. Stopped l3 container in controller-2 Result: No traffic to/from the FIP at all. (overcloud) [stack@undercloud ~]$ ping 10.0.0.131 PING 10.0.0.131 (10.0.0.131) 56(84) bytes of data. 64 bytes from 10.0.0.131: icmp_seq=1 ttl=63 time=1.83 ms 64 bytes from 10.0.0.131: icmp_seq=2 ttl=63 time=1.56 ms <---- Last l3 container was stopped here (step 5) in the above description ----> From 10.0.0.1 icmp_seq=10 Destination Host Unreachable From 10.0.0.1 icmp_seq=11 Destination Host Unreachable When containers are stopped, I guess that the qrouter namespace is not accessible by the kernel: [heat-admin@overcloud-controller-2 ~]$ sudo ip netns e qrouter-5244e91c-f533-4128-9289-f37c9656792c ip a RTNETLINK answers: Invalid argument RTNETLINK answers: Invalid argument setting the network namespace "qrouter-5244e91c-f533-4128-9289-f37c9656792c" failed: Invalid argument This means that not only we're getting controlplane downtime but also dataplane which could be seen as a regression when compared to non-containerized environments. The same would happen with DHCP and I expect instances not being able to fetch IP addresses from dnsmasq when dhcp containers are stopped. I have deployed a 3 controllers - 3 computes HA environment with ML2/OVS and observed dataplane downtime when restarting/stopping neutron-l3 container on controllers. This is what I did: 1. Created a network, subnet, router, a VM and attached a FIP to the VM 2. Left a ping running on the undercloud to the FIP 3. Stopped l3 container in controller-0.    Result: Observed some packet loss while the router was failed over to controller-1 4. Stopped l3 container in controller-1    Result: Observed some packet loss while the router was failed over to controller-2 5. Stopped l3 container in controller-2    Result: No traffic to/from the FIP at all. (overcloud) [stack@undercloud ~]$ ping 10.0.0.131 PING 10.0.0.131 (10.0.0.131) 56(84) bytes of data. 64 bytes from 10.0.0.131: icmp_seq=1 ttl=63 time=1.83 ms 64 bytes from 10.0.0.131: icmp_seq=2 ttl=63 time=1.56 ms <---- Last l3 container was stopped here (step 5 above)----> From 10.0.0.1 icmp_seq=10 Destination Host Unreachable From 10.0.0.1 icmp_seq=11 Destination Host Unreachable When containers are stopped, I guess that the qrouter namespace is not accessible by the kernel: [heat-admin@overcloud-controller-2 ~]$ sudo ip netns e qrouter-5244e91c-f533-4128-9289-f37c9656792c ip a RTNETLINK answers: Invalid argument RTNETLINK answers: Invalid argument setting the network namespace "qrouter-5244e91c-f533-4128-9289-f37c9656792c" failed: Invalid argument This means that not only we're getting controlplane downtime but also dataplane which could be seen as a regression when compared to non-containerized environments. The same would happen with DHCP and I expect instances not being able to fetch IP addresses from dnsmasq when dhcp containers are stopped.
2017-12-18 18:53:26 Brian Haley bug added subscriber Brian Haley
2017-12-19 01:15:11 Lujin Luo neutron: status New Incomplete
2017-12-20 00:32:58 Lujin Luo tags l3
2017-12-20 09:55:20 Toni Freger bug added subscriber Toni Freger
2017-12-21 15:36:23 Assaf Muller neutron: status Incomplete Confirmed
2017-12-21 15:36:34 Assaf Muller neutron: importance Undecided Critical
2017-12-21 15:37:26 Assaf Muller bug task added tripleo
2017-12-21 15:37:36 Assaf Muller tripleo: status New Confirmed
2017-12-22 00:14:22 Emilien Macchi tripleo: status Confirmed Triaged
2017-12-22 00:14:38 Emilien Macchi tripleo: importance Undecided High
2017-12-22 00:14:43 Emilien Macchi tripleo: milestone queens-3
2018-01-26 00:53:39 Emilien Macchi tripleo: milestone queens-3 queens-rc1
2018-02-09 15:01:07 Brent Eagles tripleo: assignee Brent Eagles (beagles)
2018-02-09 18:03:03 OpenStack Infra tripleo: status Triaged In Progress
2018-03-02 20:24:19 Alex Schultz tripleo: milestone queens-rc1 rocky-1
2018-03-14 16:18:48 OpenStack Infra tripleo: assignee Brent Eagles (beagles) Jiří Stránský (jistr)
2018-03-14 16:20:13 Jiří Stránský tripleo: assignee Jiří Stránský (jistr) Brent Eagles (beagles)
2018-03-16 11:37:59 OpenStack Infra tripleo: assignee Brent Eagles (beagles) Jiří Stránský (jistr)
2018-03-26 15:09:11 OpenStack Infra tags l3 in-stable-queens l3
2018-04-20 17:41:13 Alex Schultz tripleo: milestone rocky-1 rocky-2
2018-06-05 19:05:54 Emilien Macchi tripleo: milestone rocky-2 rocky-3
2018-07-26 13:43:30 Emilien Macchi tripleo: milestone rocky-3 rocky-rc1
2018-08-14 15:03:31 Alex Schultz tripleo: milestone rocky-rc1 stein-1
2018-10-30 16:27:00 Juan Antonio Osorio Robles tripleo: milestone stein-1 stein-2
2019-01-13 22:46:12 Emilien Macchi tripleo: milestone stein-2 stein-3
2019-03-14 02:32:08 Alex Schultz tripleo: milestone stein-3 stein-rc1
2019-04-12 09:14:15 Jiří Stránský tripleo: status In Progress Fix Released
2019-04-12 09:16:51 Bernard Cafarelli neutron: status Confirmed Invalid