This happens because the containers are mounting host /run in their own /run and namespaces are left behind after stopping/restarting the namespaces as these bug show [0][1]. I applied [2] and now stopping the container will still cause dataplane downtime but also restarting containers simply won't work (we may need additional bug for this).
Namespaces can't be now seen from outside the containers:
However, l3 agent fails to initialize because it can't access to them after restart:
()[root@overcloud-controller-2 /]# ip netns exec qrouter-5244e91c-f533-4128-9289-f37c9656792c ip a
RTNETLINK answers: Invalid argument
setting the network namespace "qrouter-5244e91c-f533-4128-9289-f37c9656792c" failed: Invalid argument
If I manually delete the namespace from inside the container and restart it, it'll work again:
()[root@overcloud-controller-2 /]# ip netns del qrouter-5244e91c-f533-4128-9289-f37c9656792c
RTNETLINK answers: Invalid argument
()[root@overcloud-controller-2 /]# ip netns del qrouter-5244e91c-f533-4128-9289-f37c9656792c
Cannot remove namespace file "/var/run/netns/qrouter-5244e91c-f533-4128-9289-f37c9656792c": No such file or directory
(overcloud) [stack@undercloud ~]$ sudo ping 10.0.0.131 -i 0.2
PING 10.0.0.131 (10.0.0.131) 56(84) bytes of data.
64 bytes from 10.0.0.131: icmp_seq=1 ttl=63 time=38.5 ms
64 bytes from 10.0.0.131: icmp_seq=2 ttl=63 time=6.58 ms
64 bytes from 10.0.0.131: icmp_seq=3 ttl=63 time=5.28 ms
64 bytes from 10.0.0.131: icmp_seq=4 ttl=63 time=2.71 ms
64 bytes from 10.0.0.131: icmp_seq=5 ttl=63 time=0.980 ms
Further details:
This happens because the containers are mounting host /run in their own /run and namespaces are left behind after stopping/restarting the namespaces as these bug show [0][1]. I applied [2] and now stopping the container will still cause dataplane downtime but also restarting containers simply won't work (we may need additional bug for this).
Namespaces can't be now seen from outside the containers:
[heat-admin@ overcloud- controller- 2 ~]$ sudo ip netns | grep qrouter overcloud- controller- 2 ~]$
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
[heat-admin@
But from inside the container, they can:
[heat-admin@ overcloud- controller- 2 ~]$ sudo docker exec --user root -it 9f8a322c4a3c bash overcloud- controller- 2 /]# ip netns | grep qrouter 5244e91c- f533-4128- 9289-f37c965679 2c
()[root@
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
qrouter-
However, l3 agent fails to initialize because it can't access to them after restart:
()[root@ overcloud- controller- 2 /]# ip netns exec qrouter- 5244e91c- f533-4128- 9289-f37c965679 2c ip a 5244e91c- f533-4128- 9289-f37c965679 2c" failed: Invalid argument
RTNETLINK answers: Invalid argument
setting the network namespace "qrouter-
If I manually delete the namespace from inside the container and restart it, it'll work again:
()[root@ overcloud- controller- 2 /]# ip netns del qrouter- 5244e91c- f533-4128- 9289-f37c965679 2c
RTNETLINK answers: Invalid argument
()[root@ overcloud- controller- 2 /]# ip netns del qrouter- 5244e91c- f533-4128- 9289-f37c965679 2c netns/qrouter- 5244e91c- f533-4128- 9289-f37c965679 2c": No such file or directory
Cannot remove namespace file "/var/run/
[heat-admin@ overcloud- controller- 2 ~]$ sudo docker restart 9f8a322c4a3c
And now ping to the FIP works back again:
(overcloud) [stack@undercloud ~]$ sudo ping 10.0.0.131 -i 0.2
PING 10.0.0.131 (10.0.0.131) 56(84) bytes of data.
64 bytes from 10.0.0.131: icmp_seq=1 ttl=63 time=38.5 ms
64 bytes from 10.0.0.131: icmp_seq=2 ttl=63 time=6.58 ms
64 bytes from 10.0.0.131: icmp_seq=3 ttl=63 time=5.28 ms
64 bytes from 10.0.0.131: icmp_seq=4 ttl=63 time=2.71 ms
64 bytes from 10.0.0.131: icmp_seq=5 ttl=63 time=0.980 ms