[L3] arp issue in router namespace in compute node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
Medium
|
Unassigned |
Bug Description
Hello,
I have an issue with Openstack Victoria. Since I moved to Openstack Victoria I still have an issue. Sometimes the VM cannot be accessed on the FIP and it gets fixed only after I clear the arp table for the private IP of the VM from the network namespace.
I did some troubleshooting and I found out that the port seems to down in the OVS.
7(qr-4affa6db-67): addr:00:
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
The MAC address of the VM port is this one:
[root@compute-38 ~]# arp -a | grep 87
? (10.10.13.87) at fa:16:3e:ee:d1:57 [ether] PERM on qr-4affa6db-67
And the ping it's now working.
[root@compute-38 ~]# ip r
10.10.13.0/24 dev qr-4affa6db-67 proto kernel scope link src 10.10.13.1
169.254.107.94/31 dev rfp-9b2225f1-b proto kernel scope link src 169.254.107.94
169.254.110.46/31 dev rfp-9b2225f1-b proto kernel scope link src 169.254.110.46
[root@compute-38 ~]# ping 10.10.13.1
PING 10.10.13.1 (10.10.13.1) 56(84) bytes of data.
64 bytes from 10.10.13.1: icmp_seq=1 ttl=64 time=0.050 ms
^C
--- 10.10.13.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.050/0.
[root@compute-38 ~]# ping 10.10.13.87
PING 10.10.13.87 (10.10.13.87) 56(84) bytes of data.
^C
--- 10.10.13.87 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 87ms
[root@compute-38 ~]#
The solution is to clear the ARP for 10.10.13.87 from the namespace.
[root@compute-38 ~]# ping 10.10.13.87
PING 10.10.13.87 (10.10.13.87) 56(84) bytes of data.
^C
--- 10.10.13.87 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 87ms
[root@compute-38 ~]# arp -d 10.10.13.87
[root@compute-38 ~]# arp -a | grep 87
? (10.10.13.87) at fa:16:3e:99:08:a5 [ether] on qr-4affa6db-67
[root@compute-38 ~]# ping 10.10.13.87
PING 10.10.13.87 (10.10.13.87) 56(84) bytes of data.
64 bytes from 10.10.13.87: icmp_seq=1 ttl=64 time=0.322 ms
64 bytes from 10.10.13.87: icmp_seq=2 ttl=64 time=0.239 ms
^C
--- 10.10.13.87 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 12ms
rtt min/avg/max/mdev = 0.239/0.
And now the FIP started to answer:
~]# ping 10.40.131.220
PING 10.40.131.220 (10.40.131.220) 56(84) bytes of data.
64 bytes from 10.40.131.220: icmp_seq=2500 ttl=61 time=1.60 ms
64 bytes from 10.40.131.220: icmp_seq=2501 ttl=61 time=0.462 ms
64 bytes from 10.40.131.220: icmp_seq=2502 ttl=61 time=0.536 ms
^C
--- 10.40.131.220 ping statistics ---
2545 packets transmitted, 46 received, 98% packet loss, time 2544013ms
rtt min/avg/max/mdev = 0.305/0.
And it's weird that after it's working the ports still looks down.
7(qr-4affa6db-67): addr:00:
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
From what I see it seems like a bug as it gets fixed after I do this workaround. And this is happening to the brand new VMs that are being deployed, but not all of them.
Do you have any idea how can I fix this issue? I updated the containers last week to the latest stable release of Victoria.
Thanks!
Andrei
no longer affects: | kolla-ansible |
Here are the logs when the VM was created from neutron- openvswitch- agent.log
2021-04-07 07:01:20.119 7 INFO neutron. plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] Port 548dbbd3- bc08-447e- a144-b5de106a58 df updated. Details: {'device': '548dbbd3- bc08-447e- a144-b5de106a58 df', 'device_id': '2c77110e- 02cf-4b2d- 9b4d-295784d6f7 2f', 'network_id': 'a5486dd5- ad9f-4ae0- 8922-6a2e70c17f 7c', 'port_id': '548dbbd3- bc08-447e- a144-b5de106a58 df', 'mac_address': 'fa:16: 3e:99:08: a5', 'admin_state_up': True, 'network_type': 'vxlan', 'segmentation_id': 7, 'physical_network': None, 'fixed_ips': [{'subnet_id': '5224c1a8- 5e7d-4e65- bc3c-82a27693b8 ea', 'ip_address': '10.10.13.87'}], 'device_owner': 'compute:nova', 'allowed_ address_ pairs': [], 'port_security_ enabled' : True, 'qos_policy_id': None, 'network_ qos_policy_ id': None, 'profile': {}, 'vif_type': 'ovs', 'vnic_type': 'normal', 'security_groups': ['78f32d1a- ee3d-49be- b6cc-f4bc8ce706 a0']} plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] process_ network_ ports - iteration:299131 - treat_devices_ added_or_ updated completed. Skipped 0 and no activated binding devices 0 of 37 devices currently available. Time elapsed: 0.014 agent.securityg roups_rpc [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] Preparing filters for devices {'548dbbd3- bc08-447e- a144-b5de106a58 df'} agent.securityg roups_rpc [req-3a7d7427- 4d54-43f7- acef-dd6081f9b7 88 742a6700df704ff dbb29c0f73c57ad ae fa7cc4c1af894b7 e92b8d0726f10d5 48 - - -] Security group member updated {'a808fd6a- dc24-433d- 9b4d-e2680be38a ba', '78f32d1a- ee3d-49be- b6cc-f4bc8ce706 a0'} plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] process_ network_ ports - iteration:299131 - agent port security group processed in 2.948 plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] Configuration for devices up ['548dbbd3- bc08-447e- a144-b5de106a58 df'] and devices down [] completed. plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] Agent rpc_loop - iteration:299131 - ports processed. Elapsed:3.787 plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] Agent rpc_loop - iteration:299131 completed. Processed ports statistics: {'regular': {'added': 1, 'updated': 1, 'removed': 0}}. Elapsed:3.787 plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] Agent rpc_loop - iteration:299132 started plugins. ml2.drivers. openvswitch. agent.ovs_ neutron_ agent [req-9889f4be- 3b0d-455c- b20f-30688126b1 da - - - - -] Agent rpc_loop - iteration:299132 - starting polling. Elapsed:0.004
2021-04-07 07:01:20.129 7 INFO neutron.
2021-04-07 07:01:20.135 7 INFO neutron.
2021-04-07 07:01:22.521 7 INFO neutron.
2021-04-07 07:01:23.063 7 INFO neutron.
2021-04-07 07:01:23.892 7 INFO neutron.
2021-04-07 07:01:23.893 7 INFO neutron.
2021-04-07 07:01:23.893 7 INFO neutron.
2021-04-07 07:01:23.894 7 INFO neutron.
2021-04-07 07:01:23.898 7 INFO neutron.
20...