A lot of re-connections between ODL and OVS

Bug #1734047 reported by Zhijiang Hu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Medium
Zhijiang Hu
Pike
Fix Released
Medium
Eduardo Gonzalez
Queens
Fix Released
Medium
Zhijiang Hu

Bug Description

When integrating ODL Carbon SR2 and OpenStack Pike for OPNFV, after deploying, VM can not be created sucessfully, nova log said neutron timed out. So I checked openvswitch log and found the following reconnection seems to be the source of the issue:

2017-11-20T17:08:47.539Z|00058|rconn|INFO|br-int<->tcp:192.168.1..105:6653:connected

2017-11-20T17:08:47.541Z|00059|rconn|INFO|br-int<->tcp:192.168.1..105:6653:connection closed by peer

2017-11-20T17:08:55.538Z|00060|rconn|INFO|br-int<->tcp:192.168.1..105:6653:connected

2017-11-20T17:08:55.540Z|00061|rconn|INFO|br-int<->tcp:192.168.1..105:6653:connection closed by peer

2017-11-20T17:09:03.537Z|00062|rconn|INFO|br-int<->tcp:192.168.1..105:6653:connected

2017-11-20T17:09:03.539Z|00063|rconn|INFO|br-int<->tcp:192.168.1..105:6653:connection closed by peer

2017-11-20T17:09:11.538Z|00064|rconn|INFO|br-int<->tcp:192.168.1..105:6653:connected

2017-11-20T17:09:11.540Z|00065|rconn|INFO|br-int<->tcp:192.168.1..105:6653:connection closed by peer

In the same time, karaf.log shows the following errors:

2017-11-21 01:30:55,535 | INFO | ntLoopGroup-7-33 | ContextChainHolderImpl | 330 -
org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:206853503483370 connected.

2017-11-21 01:30:55,535 | WARN | ntLoopGroup-7-33 | ContextChainHolderImpl | 330 -
org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:206853503483370 is already trying to connect, wait
until succeeded or disconnected.

2017-11-21 01:31:03,534 | INFO | ntLoopGroup-7-34 | ConnectionAdapterImpl | 319 -
org.opendaylight.openflowjava.openflow-protocol-impl - 0.9.2.Carbon | Hello received

2017-11-21 01:31:03,535 | INFO | ntLoopGroup-7-34 | ContextChainHolderImpl | 330 -
org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:206853503483370 connected.

2017-11-21 01:31:03,535 | WARN | ntLoopGroup-7-34 | ContextChainHolderImpl | 330 -
org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:206853503483370 is already trying to connect, wait
until succeeded or disconnected.

2017-11-21 01:31:11,534 | INFO | ntLoopGroup-7-35 | ConnectionAdapterImpl | 319 -
org.opendaylight.openflowjava.openflow-protocol-impl - 0.9.2.Carbon | Hello received

2017-11-21 01:31:11,535 | INFO | ntLoopGroup-7-35 | ContextChainHolderImpl | 330 -
org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:206853503483370 connected.

2017-11-21 01:31:11,535 | WARN | ntLoopGroup-7-35 | ContextChainHolderImpl | 330 -
org.opendaylight.openflowplugin.impl - 0.4.2.Carbon | Device openflow:206853503483370 is already trying to connect, wait
until succeeded or disconnected.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.openstack.org/522466

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.openstack.org/522466
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=28b50c22ceb2d9e1b797fe748c61363ec550b9a3
Submitter: Zuul
Branch: master

commit 28b50c22ceb2d9e1b797fe748c61363ec550b9a3
Author: Zhijiang Hu <email address hidden>
Date: Thu Nov 23 03:52:42 2017 -0500

    Let OVS to connect to the individual IPs of each ODL node

    Close-Bug: 1734047

    For ODL clustering, one should explicitly points switches to each
    of the ODL instances. The openflowplugin logic will figure out
    which controller should be the master, and which should be the
    slave.

    Kolla currently sets the manager to one of the specific ODL over
    ptcp and another one through the VIP. The VIP is probably
    forwarding the traffic to that same ODL so from ODL's perspective
    it's getting two duplicated connection requests from the same OVS
    which will cause re-connection problem.

    This PS does:
    1) Let OVS to connect to the individual IPs of each ODL node in
    a ODL cluster instead of only connect to the representative over
    VIP. Devstack is doing the same thing[1]. Further more, there is no
    need for HAProxy to be frontend for ODL southbound.

    2) Delete the unusd ptcp connection option.

    [1] https://review.openstack.org/#/c/249484/

    Change-Id: Ib57e6fbb5ce64a48be0506904d3c8397ed6f70d9
    Signed-off-by: Zhijiang Hu <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/530044

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/pike)

Reviewed: https://review.openstack.org/530044
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=197900869c0811cc57c34d2a89b051a6c400300b
Submitter: Zuul
Branch: stable/pike

commit 197900869c0811cc57c34d2a89b051a6c400300b
Author: Zhijiang Hu <email address hidden>
Date: Thu Nov 23 03:52:42 2017 -0500

    Let OVS to connect to the individual IPs of each ODL node

    For ODL clustering, one should explicitly points switches to each
    of the ODL instances. The openflowplugin logic will figure out
    which controller should be the master, and which should be the
    slave.

    Kolla currently sets the manager to one of the specific ODL over
    ptcp and another one through the VIP. The VIP is probably
    forwarding the traffic to that same ODL so from ODL's perspective
    it's getting two duplicated connection requests from the same OVS
    which will cause re-connection problem.

    This PS does:
    1) Let OVS to connect to the individual IPs of each ODL node in
    a ODL cluster instead of only connect to the representative over
    VIP. Devstack is doing the same thing[1]. Further more, there is no
    need for HAProxy to be frontend for ODL southbound.

    2) Delete the unusd ptcp connection option.

    [1] https://review.openstack.org/#/c/249484/

    Closes-Bug: #1734047
    Change-Id: Ib57e6fbb5ce64a48be0506904d3c8397ed6f70d9
    Signed-off-by: Zhijiang Hu <email address hidden>
    (cherry picked from commit 28b50c22ceb2d9e1b797fe748c61363ec550b9a3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 5.0.2

This issue was fixed in the openstack/kolla-ansible 5.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.