ovsdbapp can time out on raft leadership change

Bug #1988457 reported by Terry Wilson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ovsdbapp
Fix Released
Undecided
Unassigned

Bug Description

When raft leadership changes, any leader-only connections will be disconnected and will need to reconnect to the new leader. When this happens, the IDL will return a txn status of TRY_AGAIN. The current code tries to do an exponential backoff with sleep() due to an issue where those can be spammed 1000s of times a second. This sleep also prevents reconnecting quickly because idl.run() is not called rapidly and can lead to timeouts.

Changed in ovsdbapp:
status: New → In Progress
Revision history for this message
Terry Wilson (otherwiseguy) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovsdbapp (master)

Reviewed: https://review.opendev.org/c/openstack/ovsdbapp/+/855531
Committed: https://opendev.org/openstack/ovsdbapp/commit/c3bacb3ba37e2824920ac79766205a3b51ab12d5
Submitter: "Zuul (22348)"
Branch: master

commit c3bacb3ba37e2824920ac79766205a3b51ab12d5
Author: Terry Wilson <email address hidden>
Date: Thu Sep 1 09:48:38 2022 -0500

    Fix TRY_AGAIN handling

    I believe removing wait_for_change back in the day was an error.
    We can't do the exponential backoff ourselves because that will
    also delay reconnecting to the the db, because idl.run() needs to
    be called. Also, do_commit() doesn't ensure that idl.run() is
    called if status is TRY_AGAIN. wait_for_change() will ensure that
    we call idl.run() to reconnect quickly and don't try the txn again
    until we have reconnected and the seqno has changed.

    Revert "Don't spam retries 100s of times a second"
    This reverts commit 6596164f51217cc7fabf302ce14ccc9d9beaff1f.

    Revert "Ensure idl.run() called on TRY_AGAIN"
    This reverts commit 1810faecc9ad2345f3e2f9185ac64194c5a0d711.

    Revert "Don't wait on TRY_AGAIN when calling commit_block()"
    This reverts commit 158ae06bce0f56e93677f94c59f81e5e76ee1ccc.

    Closes-Bug: #1988457
    Change-Id: I237136262862d5117d08eb3b513a0b8658a79f05

Changed in ovsdbapp:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovsdbapp (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/ovsdbapp/+/856198

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovsdbapp (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/ovsdbapp/+/856199

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovsdbapp (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/ovsdbapp/+/856200

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovsdbapp (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/ovsdbapp/+/856198
Committed: https://opendev.org/openstack/ovsdbapp/commit/315c8096c9bb3ad6d09e3b2f09bc1c128bc35497
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 315c8096c9bb3ad6d09e3b2f09bc1c128bc35497
Author: Terry Wilson <email address hidden>
Date: Thu Sep 1 09:48:38 2022 -0500

    Fix TRY_AGAIN handling

    I believe removing wait_for_change back in the day was an error.
    We can't do the exponential backoff ourselves because that will
    also delay reconnecting to the the db, because idl.run() needs to
    be called. Also, do_commit() doesn't ensure that idl.run() is
    called if status is TRY_AGAIN. wait_for_change() will ensure that
    we call idl.run() to reconnect quickly and don't try the txn again
    until we have reconnected and the seqno has changed.

    Revert "Don't spam retries 100s of times a second"
    This reverts commit 6596164f51217cc7fabf302ce14ccc9d9beaff1f.

    Revert "Ensure idl.run() called on TRY_AGAIN"
    This reverts commit 1810faecc9ad2345f3e2f9185ac64194c5a0d711.

    Revert "Don't wait on TRY_AGAIN when calling commit_block()"
    This reverts commit 158ae06bce0f56e93677f94c59f81e5e76ee1ccc.

    Closes-Bug: #1988457
    Change-Id: I237136262862d5117d08eb3b513a0b8658a79f05
    (cherry picked from commit c3bacb3ba37e2824920ac79766205a3b51ab12d5)

tags: added: in-stable-yoga
tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovsdbapp (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/ovsdbapp/+/856199
Committed: https://opendev.org/openstack/ovsdbapp/commit/fc62ae2a3a5ee96de297bdc903933c4e5ba46f15
Submitter: "Zuul (22348)"
Branch: stable/xena

commit fc62ae2a3a5ee96de297bdc903933c4e5ba46f15
Author: Terry Wilson <email address hidden>
Date: Thu Sep 1 09:48:38 2022 -0500

    Fix TRY_AGAIN handling

    I believe removing wait_for_change back in the day was an error.
    We can't do the exponential backoff ourselves because that will
    also delay reconnecting to the the db, because idl.run() needs to
    be called. Also, do_commit() doesn't ensure that idl.run() is
    called if status is TRY_AGAIN. wait_for_change() will ensure that
    we call idl.run() to reconnect quickly and don't try the txn again
    until we have reconnected and the seqno has changed.

    Revert "Don't spam retries 100s of times a second"
    This reverts commit 6596164f51217cc7fabf302ce14ccc9d9beaff1f.

    Revert "Ensure idl.run() called on TRY_AGAIN"
    This reverts commit 1810faecc9ad2345f3e2f9185ac64194c5a0d711.

    Revert "Don't wait on TRY_AGAIN when calling commit_block()"
    This reverts commit 158ae06bce0f56e93677f94c59f81e5e76ee1ccc.

    Closes-Bug: #1988457
    Change-Id: I237136262862d5117d08eb3b513a0b8658a79f05
    (cherry picked from commit c3bacb3ba37e2824920ac79766205a3b51ab12d5)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovsdbapp (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/ovsdbapp/+/856200
Committed: https://opendev.org/openstack/ovsdbapp/commit/dd7e3321915fbbc397781452973418449a6465f6
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit dd7e3321915fbbc397781452973418449a6465f6
Author: Terry Wilson <email address hidden>
Date: Thu Sep 1 09:48:38 2022 -0500

    Fix TRY_AGAIN handling

    I believe removing wait_for_change back in the day was an error.
    We can't do the exponential backoff ourselves because that will
    also delay reconnecting to the the db, because idl.run() needs to
    be called. Also, do_commit() doesn't ensure that idl.run() is
    called if status is TRY_AGAIN. wait_for_change() will ensure that
    we call idl.run() to reconnect quickly and don't try the txn again
    until we have reconnected and the seqno has changed.

    Revert "Don't spam retries 100s of times a second"
    This reverts commit 6596164f51217cc7fabf302ce14ccc9d9beaff1f.

    Revert "Ensure idl.run() called on TRY_AGAIN"
    This reverts commit 1810faecc9ad2345f3e2f9185ac64194c5a0d711.

    Revert "Don't wait on TRY_AGAIN when calling commit_block()"
    This reverts commit 158ae06bce0f56e93677f94c59f81e5e76ee1ccc.

    Closes-Bug: #1988457
    Change-Id: I237136262862d5117d08eb3b513a0b8658a79f05
    (cherry picked from commit c3bacb3ba37e2824920ac79766205a3b51ab12d5)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 1.9.4

This issue was fixed in the openstack/ovsdbapp 1.9.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 2.2.0

This issue was fixed in the openstack/ovsdbapp 2.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 1.12.3

This issue was fixed in the openstack/ovsdbapp 1.12.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovsdbapp (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/ovsdbapp/+/871563

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovsdbapp (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/ovsdbapp/+/871563
Committed: https://opendev.org/openstack/ovsdbapp/commit/97e738dc2b81590120d76a9ec6ac521067a536b7
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 97e738dc2b81590120d76a9ec6ac521067a536b7
Author: Terry Wilson <email address hidden>
Date: Thu Sep 1 09:48:38 2022 -0500

    Fix TRY_AGAIN handling

    I believe removing wait_for_change back in the day was an error.
    We can't do the exponential backoff ourselves because that will
    also delay reconnecting to the the db, because idl.run() needs to
    be called. Also, do_commit() doesn't ensure that idl.run() is
    called if status is TRY_AGAIN. wait_for_change() will ensure that
    we call idl.run() to reconnect quickly and don't try the txn again
    until we have reconnected and the seqno has changed.

    Revert "Don't spam retries 100s of times a second"
    This reverts commit 6596164f51217cc7fabf302ce14ccc9d9beaff1f.

    Revert "Ensure idl.run() called on TRY_AGAIN"
    This reverts commit 1810faecc9ad2345f3e2f9185ac64194c5a0d711.

    Revert "Don't wait on TRY_AGAIN when calling commit_block()"
    This reverts commit 158ae06bce0f56e93677f94c59f81e5e76ee1ccc.

    Closes-Bug: #1988457
    Change-Id: I237136262862d5117d08eb3b513a0b8658a79f05
    (cherry picked from commit c3bacb3ba37e2824920ac79766205a3b51ab12d5)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 1.15.3

This issue was fixed in the openstack/ovsdbapp 1.15.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovsdbapp 2.1.1

This issue was fixed in the openstack/ovsdbapp 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.