Missing ip rule causes FIP removal to fail

Bug #2030804 reported by Adam Oswick
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
Medium
Brian Haley

Bug Description

Summary
-------
If the ip rule associated with a FIP is somehow lost or deleted, when Neutron L3 agent goes to remove the rule it will error and cause the entire FIP removal process to fail.

High level description
----------------------
Rather than erroring if an ip rule that should exist is no longer present, https://opendev.org/openstack/neutron/src/commit/c453813d0664259c4da0d132f224be2eebe70072/neutron/agent/l3/dvr_local_router.py#L216-L227 should handle this gracefully with a warning.

Pre-conditions
--------------
- Neutron DVR mode is enabled
- Subnets are created and attached to a router with an external gateway
- A VM is created on the aforementioned subnet and a FIP is associated with it

Step-by-step reproduction steps
-------------------------------
- Within the qrouter network namespace, run 'ip rule del $FIXED_IP lookup 16'
- Disassociate the FIP from the VM and monitor Neutron L3 agent logs for errors

Expected output
---------------
Neutron L3 agent logs that the ip rule didn't exist and then continues as normal.

Actual output
-------------
Neutron L3 agent throws an "pyroute2.netlink.exceptions.NetLinkError: (2, 'No such file or directory')" exception and does not complete FIP removal from the host.

Version
-------
- OpenStack Zed

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/890827

Changed in neutron:
status: New → In Progress
Revision history for this message
Adam Oswick (adamoswick) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/891236

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Adam <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/890827
Reason: Abandoned as Brian suggested an alternative on https://review.opendev.org/c/openstack/neutron/+/891236

Changed in neutron:
importance: Undecided → Medium
assignee: nobody → Brian Haley (brian-haley)
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

Why the "reproduction steps" includes a manual operation on the namespace? The question here is how this IP rule was deleted before the FIP is disassociated. Do you know the root reason? When did you find this problem? Can we reproduce this issue without manual steps?

Regards.

Revision history for this message
Adam Oswick (adamoswick) wrote :

Hi Rodolfo,

| Why the "reproduction steps" includes a manual operation on the namespace?

I've been unable to work out why the ip rule disappears (or isn't created in the first place). Deleting the ip rule therefore allows us to simulate the result even if we don't know the original root cause.

| Do you know the root reason?

Not at the moment.

| When did you find this problem?

We've been seeing this problem occasionally since we first built our clouds (Yoga iirc).

| Can we reproduce this issue without manual steps?

Not at the moment. We just have to wait for it to occur in our environments.

While obviously it would be good to find the root cause of the issue here (why is the ip rule deleted or missing in the first place), my thought was that as ip rules are not directly controlled by Neutron, it is better for Neutron to more gracefully handle scenarios where they have changed outside of its control.

At some point, I can have another go at trying to find the root cause of this issue. However, even with that identified and resolved, I still feel like it would be a good idea for Neutron to handle these missing ip rules (and other resources) more gracefully.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.