Centralized SNAT failover does not recover until "systemctl restart neutron-l3-agent" on transferred node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
In Progress
|
Medium
|
Ann Taraday |
Bug Description
**Environment**
Queens
OVSGTW DVR Mode: dvr_snat
CMP DVR Mode: dvr
No L3 HA
Use Case: Centralized FIPs (aka Floating IPs agains unbound ports)
https:/
**How to reproduce**
1. Create normally a VM
2. Create allowed-pair port against the VM port
openstack port list --server <server_name> # Get port id
openstack port create --security-group <sec_group> --fixed-ip subnet=
openstack port set --allowed-address ip-address=
3. Assign floating ip to the port
openstack floating ip set --port <port_name> <floating_ip>
4. Inside the deployed VM create IP alias for the new ip address
ip addr add <ip_address>/24 dev ens3
5. Detect which gtw node is hosting the centralized fip
neutron l3-agent-
6. Perform manual failover
neutron l3-agent-
neutron l3-agent-router-add <new-l3-agent> <router>
(Or) Perform automatic failover
shutdown -h now (on hosting gtw)
7. Detect failover happened on new node
neutron l3-agent-
**Expected Result**
Connection to floating ip address recovers automatically
**Actual Result**
Connection does not recover. Reoccurrence is 100%
**How to recover**
Perform "neutron-l3-agent" restart on hosting node (after failover). Recovers within few seconds.
systemctl restart neutron-l3-agent
**Additional information**
After failover the SNAT namespace does not include the sysctl rules that should be added upon namespace creation. We have also confirmed that fixing them manually also fixes the issue.
https:/
The following is the sysctl's after failover
---
root@gtw03:~# ip netns exec snat-8737216a-
net.ipv4.ip_forward = 0
root@gtw03:~# ip netns exec snat-8737216a-
net.ipv4.
root@gtw03:~# ip netns exec snat-8737216a-
net.ipv4.
root@gtw03:~# ip netns exec snat-8737216a-
net.ipv6.
root@gtw03:~#
---
We are believe this caused by the following commits which only does initialization when neutron-l3-agent starts.
https:/
Changed in neutron: | |
importance: | Undecided → Medium |
tags: | added: l3-dvr-backlog |
Fix proposed to branch: master /review. opendev. org/734070
Review: https:/