regression: ovsdb-server drops connections when using DNS name to configure passive listener

Bug #1998781 reported by James Page
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Sunbeam Snap
Fix Released
Undecided
Unassigned
Ubuntu Cloud Archive
New
Undecided
Unassigned
Ussuri
New
Undecided
Unassigned
Wallaby
New
Undecided
Unassigned
Xena
New
Undecided
Unassigned
Yoga
New
Undecided
Unassigned
Zed
New
Undecided
Unassigned
openvswitch (Ubuntu)
Fix Released
High
Unassigned
Focal
New
Undecided
Unassigned
Jammy
New
Undecided
Unassigned
Kinetic
Won't Fix
Undecided
Unassigned
Lunar
Fix Released
High
Unassigned

Bug Description

For a Yoga base deployment (OVN: 22.03, OVS: 2.17.2) the neutron-ovn-metadata-agent continually reconnects to the ovn-central services via ovn-relay - the inactivity probe timeout is set to 60 seconds in OVN central and on the client side so I think the issue is that the ovn-relay is probing with a 5 second timeout and then resetting connections.

ovn-relay log:

2022-12-05T11:19:54.403Z|21688|reconnect|DBG|ssl:ovn-central-0.ovn-central-endpoints.openstack.svc.cluster.local:6642: idle 5003 ms, sending inactivity probe
2022-12-05T11:19:54.403Z|21689|reconnect|DBG|ssl:ovn-central-0.ovn-central-endpoints.openstack.svc.cluster.local:6642: entering IDLE
2022-12-05T11:19:54.403Z|21690|jsonrpc|DBG|ssl:ovn-central-0.ovn-central-endpoints.openstack.svc.cluster.local:6642: send request, method="echo", params=[], id="echo"

by passing the ovn-relay and communicating directly with the SB db server worked around this issue.

Revision history for this message
James Page (james-page) wrote :

ovn-controller log:

2022-12-05T11:05:09.848Z|00745|main|INFO|OVNSB commit failed, force recompute next time.
2022-12-05T11:05:17.865Z|00746|reconnect|INFO|ssl:10.246.115.21:6642: connected
2022-12-05T11:05:18.374Z|00747|reconnect|INFO|ssl:10.246.115.21:6642: connection closed by peer
2022-12-05T11:05:18.375Z|00748|main|INFO|OVNSB commit failed, force recompute next time.
2022-12-05T11:05:26.390Z|00749|reconnect|INFO|ssl:10.246.115.21:6642: connected
2022-12-05T11:05:26.519Z|00750|reconnect|INFO|ssl:10.246.115.21:6642: connection closed by peer
2022-12-05T11:05:26.519Z|00751|main|INFO|OVNSB commit failed, force recompute next time.
2022-12-05T11:05:34.534Z|00752|reconnect|INFO|ssl:10.246.115.21:6642: connected
2022-12-05T11:05:41.921Z|00753|reconnect|INFO|ssl:10.246.115.21:6642: connection closed by peer
2022-12-05T11:05:41.922Z|00754|main|INFO|OVNSB commit failed, force recompute next time.

description: updated
Revision history for this message
James Page (james-page) wrote :

neutron-ovn-metadata-agent log:

Dec 05 11:04:53 node-gadomski neutron-ovn-metadata-agent[330217]: 2022-12-05 11:04:53.904 330217 INFO neutron.agent.ovn.metadata.agent [-] Connection to OVSDB established, doing a full sync
Dec 05 11:04:54 node-gadomski neutron-ovn-metadata-agent[330217]: 2022-12-05 11:04:54.315 330217 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connection closed by peer
Dec 05 11:04:54 node-gadomski neutron-ovn-metadata-agent[331187]: 2022-12-05 11:04:54.445 331187 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connected
Dec 05 11:05:01 node-gadomski neutron-ovn-metadata-agent[331187]: 2022-12-05 11:05:01.823 331187 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connection closed by peer
Dec 05 11:05:01 node-gadomski neutron-ovn-metadata-agent[331188]: 2022-12-05 11:05:01.831 331188 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connected
Dec 05 11:05:02 node-gadomski neutron-ovn-metadata-agent[331188]: 2022-12-05 11:05:02.338 331188 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connection closed by peer
Dec 05 11:05:02 node-gadomski neutron-ovn-metadata-agent[330217]: 2022-12-05 11:05:02.349 330217 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connected
Dec 05 11:05:02 node-gadomski neutron-ovn-metadata-agent[330217]: 2022-12-05 11:05:02.454 330217 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connection closed by peer
Dec 05 11:05:02 node-gadomski neutron-ovn-metadata-agent[330217]: 2022-12-05 11:05:02.456 330217 INFO neutron.agent.ovn.metadata.agent [-] Connection to OVSDB established, doing a full sync
Dec 05 11:05:09 node-gadomski neutron-ovn-metadata-agent[331187]: 2022-12-05 11:05:09.858 331187 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connected
Dec 05 11:05:10 node-gadomski neutron-ovn-metadata-agent[331187]: 2022-12-05 11:05:10.354 331187 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connection closed by peer
Dec 05 11:05:10 node-gadomski neutron-ovn-metadata-agent[331188]: 2022-12-05 11:05:10.359 331188 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connected
Dec 05 11:05:10 node-gadomski neutron-ovn-metadata-agent[331188]: 2022-12-05 11:05:10.479 331188 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connection closed by peer
Dec 05 11:05:10 node-gadomski neutron-ovn-metadata-agent[330217]: 2022-12-05 11:05:10.488 330217 INFO ovsdbapp.backend.ovs_idl.vlog [-] ssl:10.246.115.21:6642: connected

description: updated
Revision history for this message
Frode Nordahl (fnordahl) wrote :

While it does not appear to be the root of the issue reported, the bug report has revealed a possible room for improvement in the relationship between the real server and the relay server with regards to the inactivity_probe configuration.

The inactivity_probe is normally adjusted on the backend server to avoid situations where the backend server is too busy to service the inactivity probe in a timely manner and as a consequence erroneously drops the connection.

Having the inactivity_probe configuration out of sync between the real server and the relay could possibly lead to more of these situations, and it would increase complexity of configuration if we leave it to the human operator to keep these things in sync. Perhaps we should teach the backend server and relay to communicate this setting between each other?

Revision history for this message
Frode Nordahl (fnordahl) wrote :

While looking into this, we found out that this behavior is only visible when using a DNS name for the remote, so the root of the issue may be related to that. Example:

$ sudo ovsdb-server --private-key=/etc/ovn/key_host --certificate=/etc/ovn/cert_host --ca-cert=/etc/ovn/ovn-chassis.crt --remote=pssl:6642:enp7s0.secure-pigeon.maas relay:OVN_Southbound:ssl:10.247.39.150:6642,ssl:10.247.39.91:6642,ssl:10.247.39.138:6642

2022-12-06T11:04:54Z|00021|reconnect|INFO|pssl:6642:enp7s0.secure-pigeon.maas: listening...
2022-12-06T11:04:54Z|00022|reconnect|INFO|pssl:6642:enp7s0.secure-pigeon.maas: listening...
2022-12-06T11:04:54Z|00023|reconnect|INFO|pssl:6642:enp7s0.secure-pigeon.maas: connected
2022-12-06T11:04:59Z|00024|jsonrpc|INFO|Dropped 9 log messages in last 6 seconds (most recently, 2 seconds ago) due to excessive rate
2022-12-06T11:04:59Z|00025|jsonrpc|INFO|pssl:6642:enp7s0.secure-pigeon.maas: new connection replacing active connection
2022-12-06T11:05:11Z|00026|jsonrpc|INFO|Dropped 3 log messages in last 10 seconds (most recently, 6 seconds ago) due to excessive rate
2022-12-06T11:05:11Z|00027|jsonrpc|INFO|pssl:6642:enp7s0.secure-pigeon.maas: new connection replacing active connection
2022-12-06T11:05:23Z|00028|jsonrpc|INFO|Dropped 4 log messages in last 11 seconds (most recently, 3 seconds ago) due to excessive rate
2022-12-06T11:05:23Z|00029|jsonrpc|INFO|pssl:6642:enp7s0.secure-pigeon.maas: new connection replacing active connection

If the `--remote` parameter is changed to contain a IP address, the re-connections do not occur.

Changed in openvswitch (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
James Page (james-page) wrote :

Resolved by dropping the use of a hostname in the pssl socket binding in the ovn-relay-k8s charm:

https://opendev.org/x/charm-ovn-relay-k8s/commit/cc6315fa89f59616b886970be8be1e47251b719b

Changed in snap-sunbeam:
status: New → Fix Released
Revision history for this message
Frode Nordahl (fnordahl) wrote (last edit ):

Using DNS name for a passive connection works before this commit: https://github.com/openvswitch/ovs/commit/08e9e5337383afd16a225334cb2549a027280537

Which was backported to the 2.16 branch as https://github.com/openvswitch/ovs/commit/1570924c3f83851f39f56e3363050b70ba1aafb0 and has been there since v2.16.3

The commit also appears to have been backported as far back as v2.13.7

Changed in openvswitch (Ubuntu):
status: Confirmed → Triaged
summary: - yoga: neutron-ovn-metadata-agent unable to communicate via ovn-relay
+ regression: ovsdb-server drops connection when using DNS name to
+ configure passive listener
Frode Nordahl (fnordahl)
summary: - regression: ovsdb-server drops connection when using DNS name to
+ regression: ovsdb-server drops connections when using DNS name to
configure passive listener
Revision history for this message
Frode Nordahl (fnordahl) wrote :
Revision history for this message
Frode Nordahl (fnordahl) wrote :
Changed in openvswitch (Ubuntu Lunar):
status: Triaged → Fix Released
Revision history for this message
Utkarsh Gupta (utkarsh) wrote :

Ubuntu 22.10 (Kinetic Kudu) has reached end of life, so this bug will not be fixed for that specific release.

Changed in openvswitch (Ubuntu Kinetic):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.