Duplicated port binding registers per port, due to live-migration failures

Bug #1979072 reported by Rodolfo Alonso
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Fix Committed
Medium
Rodolfo Alonso

Bug Description

Related bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2106406

Description:
This issue has been reported frequently by our customers. If during a live migration [1] Neutron does not respond fast enough and fails during the activation step, the live migration process is reverted. The VM will continue running in the source host. However this process can leave a leftover in the Neutron database: two port binding registers pointing to the same port. Database will look like [2].

If Nova tries to bind the port again, Neutron will raise an error [3].

Steps to Reproduce:
1. Put some sleep, like 120 seconds, in Neutron activate method [4].
2. Run live migration
3. Observe ml2_port_bindings table for the VM port

This bug proposes:
- To have a periodic worker (same as OVN mech driver "MaintenanceWorker" or "DbQuotaNoLockDriver" periodic worker) loaded when "Ml2Plugin" starts.
- Add a task inside this periodic worker to monitor the port binding table, looking for duplicated records. Of course, we should not interfere any migration process. Those duplicated port binding registers should

[1]https://specs.openstack.org/openstack/neutron-specs/_images/seqdiag-7d93af8770687c1a95685208b8e6ca0553a8fb25.png
[2]https://paste.opendev.org/show/b4Kb8VTlBHlRERjzcBFA/
[3]https://paste.opendev.org/show/bsT0hGiZGKT2ihWzbPfx/
[4]https://github.com/openstack/neutron/blob/150396625aaa870640a6fb9d636aeb90cea4da3e/neutron/plugins/ml2/plugin.py#L2556

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/846422

Changed in neutron:
status: New → In Progress
Miguel Lavalle (minsel)
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/846422
Committed: https://opendev.org/openstack/neutron/commit/c5b76a8393a21adb87447c925da2ede4a75dd11a
Submitter: "Zuul (22348)"
Branch: master

commit c5b76a8393a21adb87447c925da2ede4a75dd11a
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Jul 7 06:31:22 2022 +0000

    Script to remove duplicated port bindings

    A new script to remove the duplicated port bindings was added. This
    script will list all ``ml2_port_bindings`` records in the database,
    finding those ones with the same port ID. Then the script removes
    those ones with status=INACTIVE. This script is useful to remove
    those leftovers that remain in the database after a failed live
    migration.

    "dry_run" mode is possible if selected in "[cli_script] dry_run"
    boolean config option. The duplicated port bindings are printed in
    the shell but not deleted.

    Related-Bug: #1979072

    Change-Id: I0de5fbb70eb852f82bd311616557985d1ce89bbf

Revision history for this message
Christian Rohmann (christian-rohmann) wrote :

While the housekeeping / cleanup script is awesome to clean artifacts already existing in the database, this seems to be a band aid to the actual issue if I may bluntly state so.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Christian:

This issue should be handled from Nova side, who is responsible of creating the port bindings during a live migration. If the live migration fails, the same process should proceed to clean up the leftovers. Neutron is a client of Nova during this process.

Regards.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/859996

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/859997

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/859998

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/859996
Committed: https://opendev.org/openstack/neutron/commit/3d307ef8f8c31ebc92a621aac75c62b83a7b3787
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 3d307ef8f8c31ebc92a621aac75c62b83a7b3787
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Jul 7 06:31:22 2022 +0000

    Script to remove duplicated port bindings

    A new script to remove the duplicated port bindings was added. This
    script will list all ``ml2_port_bindings`` records in the database,
    finding those ones with the same port ID. Then the script removes
    those ones with status=INACTIVE. This script is useful to remove
    those leftovers that remain in the database after a failed live
    migration.

    "dry_run" mode is possible if selected in "[cli_script] dry_run"
    boolean config option. The duplicated port bindings are printed in
    the shell but not deleted.

    Related-Bug: #1979072

    Conflicts:
        neutron/conf/common.py

    Change-Id: I0de5fbb70eb852f82bd311616557985d1ce89bbf
    (cherry picked from commit c5b76a8393a21adb87447c925da2ede4a75dd11a)

tags: added: in-stable-yoga
tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/859997
Committed: https://opendev.org/openstack/neutron/commit/d033ab6eb6dadeb1770eb259d4d59b469d9e3bc0
Submitter: "Zuul (22348)"
Branch: stable/xena

commit d033ab6eb6dadeb1770eb259d4d59b469d9e3bc0
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Jul 7 06:31:22 2022 +0000

    Script to remove duplicated port bindings

    A new script to remove the duplicated port bindings was added. This
    script will list all ``ml2_port_bindings`` records in the database,
    finding those ones with the same port ID. Then the script removes
    those ones with status=INACTIVE. This script is useful to remove
    those leftovers that remain in the database after a failed live
    migration.

    "dry_run" mode is possible if selected in "[cli_script] dry_run"
    boolean config option. The duplicated port bindings are printed in
    the shell but not deleted.

    Related-Bug: #1979072

    Conflicts:
        neutron/conf/common.py

    Change-Id: I0de5fbb70eb852f82bd311616557985d1ce89bbf
    (cherry picked from commit c5b76a8393a21adb87447c925da2ede4a75dd11a)
    (cherry picked from commit 3d307ef8f8c31ebc92a621aac75c62b83a7b3787)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/859998
Committed: https://opendev.org/openstack/neutron/commit/8b1ec76321e33406c06f8cc49489dbdceeb0ee91
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 8b1ec76321e33406c06f8cc49489dbdceeb0ee91
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu Jul 7 06:31:22 2022 +0000

    Script to remove duplicated port bindings

    A new script to remove the duplicated port bindings was added. This
    script will list all ``ml2_port_bindings`` records in the database,
    finding those ones with the same port ID. Then the script removes
    those ones with status=INACTIVE. This script is useful to remove
    those leftovers that remain in the database after a failed live
    migration.

    "dry_run" mode is possible if selected in "[cli_script] dry_run"
    boolean config option. The duplicated port bindings are printed in
    the shell but not deleted.

    Related-Bug: #1979072

    Conflicts:
        neutron/conf/common.py
        neutron/objects/ports.py
        setup.cfg

    Change-Id: I0de5fbb70eb852f82bd311616557985d1ce89bbf
    (cherry picked from commit c5b76a8393a21adb87447c925da2ede4a75dd11a)
    (cherry picked from commit 3d307ef8f8c31ebc92a621aac75c62b83a7b3787)

description: updated
Revision history for this message
Christian Rohmann (christian-rohmann) wrote (last edit ):

Rodolfo - sorry I missed your response until now.

I understand now that this issue originates in Nova and understand that you can only clean up what is left dangling in Neutron after a failed migration.

For us the issue looks a little different though:

We actually use linuxbridge ML2 instead of OVN, but observe a quite similar issue with live-migrations failing due to issues with ports and their bindings. In our case the machine being migrated is actually started on the target host, but the port is not binding and causing duplicate errors in the log like described here in this issue. But strangely the port does seem to be up and working, it's just that Nova (I believe due to the duplicate errors / binding failure) threats the migration as "failed".

We then end up having to manually update the host a machine runs on, as documented e.g. here: https://access.redhat.com/solutions/2070503

Do you happen to have a bug ID or know of some other conversation that happens within the Nova project to tackle this issue?

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Christian:

What version of OpenStack are you using? Linux Bridge is now considered as "experimental"; that means we no longer develop any feature and the support is limited (or null). The issue could be in how Linux Bridge mech driver sends the update to Nova (I'm just guessing).

My recommendation (just this) is to migrate to another network backend (ML2/OVS or ML2/OVN).

In any case, the problem I'm presenting is this bug is the consequence of failed live migration. I'm not trying in this bug to solve the issue of a failed migration but to solve the consequence. The issue you have needs further investigation: why Nova is not finishing the migration, is Neutron correctly informing Nova about the destination port creation, etc.

Regards.

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
Christian Rohmann (christian-rohmann) wrote :
Download full text (3.6 KiB)

We are currently on Xena. We now work through some issues, deprecations and other log noise and will then start our upgrade to Yoga and Zed.

I know that linuxbridge was moved back to experimental state - we do have had our fair share of issues with it and are already planning our migration to OVN (after we are on Yoga or Zed). See my question about how to approach this on the ML: https://lists.openstack.org/pipermail/openstack-discuss/2022-August/030070.html . We currently also wait for OVN to have feature parity with what we offer on our cloud (e.g. VPNaaS - https://review.opendev.org/c/openstack/neutron-vpnaas/+/765353).

Since all this might take a few months still we are also trying to stabilize on linuxbridge - as in fixing this https://bugs.launchpad.net/neutron/+bug/1943449.

Just like we are looking into this very issue here about failing live migrations. I now tried to use your script on Xena - but it seems you ran into an issue when backporting that change to Yoga and Xena:

You do call "common_config.register_cli_script_opts()" (https://review.opendev.org/c/openstack/neutron/+/859996/1/neutron/common/config.py or https://review.opendev.org/c/openstack/neutron/+/859997/1/neutron/common/config.py) but that function does not exist prior to https://opendev.org/openstack/neutron/commit/4d3a274765a364f6f7b6bff163d6f7110bdbdbbe (>= Zed).

I believe this needs a fixup as otherwise the script does simply though an error:

```
neutron-remove-duplicated-port-bindings
Traceback (most recent call last):
  File "/usr/bin/neutron-remove-duplicated-port-bindings", line 10, in <module>
    sys.exit(main())
  File "/usr/lib/python3/dist-packages/neutron/cmd/remove_duplicated_port_bindings.py", line 50, in main
    setup_conf(conf)
  File "/usr/lib/python3/dist-packages/neutron/cmd/remove_duplicated_port_bindings.py", line 31, in setup_conf ...

Read more...

Revision history for this message
Anton Kurbatov (akurbatov) wrote (last edit ):

I found new issue with new neutron-remove-duplicated-port-bindings helping tool here https://bugs.launchpad.net/neutron/+bug/2000078

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/868195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/868196

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/868197

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/868197
Committed: https://opendev.org/openstack/neutron/commit/4c94eb5e27d14ba5f69b8535cf3ee2f16c9d48de
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 4c94eb5e27d14ba5f69b8535cf3ee2f16c9d48de
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun Dec 11 20:43:29 2022 +0100

    [stable-only] Load config options importing ``common_config``

    In newer versions (Zed+), [1] enforces to explicitly load the
    configuration options by calling ``register_common_config_options``
    method. In older releases, those options are loaded when the
    module ``common_config`` is loaded.

    [1]https://review.opendev.org/c/openstack/neutron/+/837392

    Related-Bug: #1979072
    Change-Id: Iad4ac8cb00a11d2f6646966d37e6f9b160f8ecf4
    (cherry picked from commit 821970b716bf2b2d0c4bea860ba644173b816fc6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/868196
Committed: https://opendev.org/openstack/neutron/commit/9a2fbd510258ffbe47073a7594261e13eef5e6db
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 9a2fbd510258ffbe47073a7594261e13eef5e6db
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun Dec 11 20:43:29 2022 +0100

    [stable-only] Load config options importing ``common_config``

    In newer versions (Zed+), [1] enforces to explicitly load the
    configuration options by calling ``register_common_config_options``
    method. In older releases, those options are loaded when the
    module ``common_config`` is loaded.

    [1]https://review.opendev.org/c/openstack/neutron/+/837392

    Related-Bug: #1979072
    Change-Id: Iad4ac8cb00a11d2f6646966d37e6f9b160f8ecf4
    (cherry picked from commit 821970b716bf2b2d0c4bea860ba644173b816fc6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/868195
Committed: https://opendev.org/openstack/neutron/commit/821970b716bf2b2d0c4bea860ba644173b816fc6
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 821970b716bf2b2d0c4bea860ba644173b816fc6
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Sun Dec 11 20:43:29 2022 +0100

    [stable-only] Load config options importing ``common_config``

    In newer versions (Zed+), [1] enforces to explicitly load the
    configuration options by calling ``register_common_config_options``
    method. In older releases, those options are loaded when the
    module ``common_config`` is loaded.

    [1]https://review.opendev.org/c/openstack/neutron/+/837392

    Related-Bug: #1979072
    Change-Id: Iad4ac8cb00a11d2f6646966d37e6f9b160f8ecf4

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.