Bug #1571598 “tor-agent crash in EvpnAgentRouteTable::DeleteOvs...” : Bugs : Juniper Openstack

tor-agent crash in EvpnAgentRouteTable::DeleteOvsPeerMulticastRouteInternal

Bug #1571598 reported by vageesan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
Critical
Manish Singh
R2.21.x
Fix Committed
Critical
Manish Singh
R2.22.x
Fix Committed
Critical
Manish Singh
R3.0
Fix Committed
Critical
Manish Singh
Trunk
Fix Committed
Critical
Manish Singh

Bug Description

contrail-tor-agent crashed with following backtrace in solution test run.

3.0.2.0-26~kilo

core is in 10.84.5.112:/cs-shared/bugs/<bug-id>/

[New LWP 3128]
[New LWP 3126]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/contrail-tor-agent --config_file /etc/contrail/contrail-tor-agent-2.co'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000008e37d0 in AgentRouteTable::vrf_entry() const ()
#0 0x00000000008e37d0 in AgentRouteTable::vrf_entry() const ()
#1 0x00000000008fe392 in EvpnAgentRouteTable::DeleteOvsPeerMulticastRouteInternal(Peer const*, unsigned int, boost::asio::ip::address_v4 const&, bool) ()
#2 0x0000000000a6817b in OVSDB::MulticastMacLocalEntry::OnVrfDelete() ()
#3 0x0000000000a68448 in OVSDB::MulticastMacLocalOvsdb::VrfReEval(IntrusivePtrRef<VrfEntry>) ()
#4 0x0000000000a6a5f9 in boost::detail::function::function_obj_invoker1<boost::_bi::bind_t<bool, boost::_mfi::mf1<bool, OVSDB::MulticastMacLocalOvsdb, IntrusivePtrRef<VrfEntry> >, boost::_bi::list2<boost::_bi::value<OVSDB::MulticastMacLocalOvsdb*>, boost::arg<1> > >, bool, IntrusivePtrRef<VrfEntry> >::invoke(boost::detail::function::function_buffer&, IntrusivePtrRef<VrfEntry>) ()
#5 0x0000000000a13683 in QueueTaskRunner<IntrusivePtrRef<VrfEntry>, WorkQueue<IntrusivePtrRef<VrfEntry> > >::RunQueue() ()
#6 0x0000000000f446ec in TaskImpl::execute() ()
#7 0x00007f572c100b3a in ?? () from /usr/lib/libtbb.so.2
#8 0x00007f572c0fc816 in ?? () from /usr/lib/libtbb.so.2
#9 0x00007f572c0fbf4b in ?? () from /usr/lib/libtbb.so.2
#10 0x00007f572c0f80ff in ?? () from /usr/lib/libtbb.so.2
#11 0x00007f572c0f82f9 in ?? () from /usr/lib/libtbb.so.2
#12 0x00007f572c31c182 in start_thread (arg=0x7f571fbfe700) at pthread_create.c:312
#13 0x00007f572b5f547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

vageesan (vageesant)
Changed in juniperopenstack:
milestone: r3.0-fcs → r3.1.0.0-fcs
amit surana (asurana-t)
tags: added: blocker
Jeba Paulaiyan (jebap)
information type: Proprietary → Public
Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Prabhjot Singh Sethi (prabhjot)
Revision history for this message
Prabhjot Singh Sethi (prabhjot) wrote :

this is potentially a side-effect of fix for Bug-1562961

where OVS-Peer path is deleted before a delete is triggered from ovsdb-client.

in case where the crash is observed it was the last route to go away from EVPN table to it ended up freeing evpn table causing NULL ptr access when ovsdb-client actually tries to delete its PATH

Issue:
---------
in API AgentRoute::SquashStalePaths even if no stale path was found it will go ahead and delete the last path from the list, which possibly happens to be OVS PATH

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/19446
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22.x

Review in progress for https://review.opencontrail.org/19447
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/19448
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.21.x

Review in progress for https://review.opencontrail.org/19449
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19446
Committed: http://github.org/Juniper/contrail-controller/commit/4567b9685726123fc41790c68cf4b3e8eea62d34
Submitter: Zuul
Branch: R3.0

commit 4567b9685726123fc41790c68cf4b3e8eea62d34
Author: Manish <email address hidden>
Date: Tue Apr 19 17:29:06 2016 +0530

Double path delete request in succession.

Problem:
In stale path cleanup, if no stale path was found function used to clean last
path seen. This resulted in not related path getting deleted. In case of OVS
delete the path to be deleted was already gone because of bug and table deleted,
resulting in ceash.

Solution:
Delete if relevant path is found.
Closes-bug: #1571598

Change-Id: I25bd7cec4c0774d0a041286c15af99bc5a2d1ada

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/19447
Committed: http://github.org/Juniper/contrail-controller/commit/217fa019ffa418d986a994d32167e7dd62b17d49
Submitter: Zuul
Branch: R2.22.x

commit 217fa019ffa418d986a994d32167e7dd62b17d49
Author: Manish <email address hidden>
Date: Tue Apr 19 17:29:06 2016 +0530

Double path delete request in succession.

Problem:
In stale path cleanup, if no stale path was found function used to clean last
path seen. This resulted in not related path getting deleted. In case of OVS
delete the path to be deleted was already gone because of bug and table deleted,
resulting in ceash.

Solution:
Delete if relevant path is found.
Closes-bug: #1571598

Change-Id: I25bd7cec4c0774d0a041286c15af99bc5a2d1ada

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/19448
Committed: http://github.org/Juniper/contrail-controller/commit/bf49b3c902b24072d8674d9a7fd0147b96638e87
Submitter: Zuul
Branch: R2.20

commit bf49b3c902b24072d8674d9a7fd0147b96638e87
Author: Manish Singh <email address hidden>
Date: Wed Mar 30 07:18:50 2016 +0530

Route present wih no paths.

Problem:
When only stale paths in headless mode were present and CN goes down, then walk
was issued to squash all of them. However after squashing it didnt check if
route has no paths and delete the same. So route was present with no paths.

Solution:
Re-org the squashing code in walk to go via proper removal of path and
check for zero path to delete route.

Closes-bug: #1562961

Conflicts:
 src/vnsw/agent/test/test_l2route.cc

Conflicts:
 src/vnsw/agent/test/test_l2route.cc

Conflicts:
 src/vnsw/agent/test/test_l2route.cc

Double path delete request in succession.

Problem:
In stale path cleanup, if no stale path was found function used to clean last
path seen. This resulted in not related path getting deleted. In case of OVS
delete the path to be deleted was already gone because of bug and table deleted,
resulting in ceash.

Solution:
Delete if relevant path is found.
Closes-bug: #1571598

Change-Id: I25bd7cec4c0774d0a041286c15af99bc5a2d1ada

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/19449
Committed: http://github.org/Juniper/contrail-controller/commit/38237b107e4972ae1cadcd7e1b84a2ca00111568
Submitter: Zuul
Branch: R2.21.x

commit 38237b107e4972ae1cadcd7e1b84a2ca00111568
Author: Manish Singh <email address hidden>
Date: Wed Mar 30 07:18:50 2016 +0530

Route present wih no paths.

Problem:
When only stale paths in headless mode were present and CN goes down, then walk
was issued to squash all of them. However after squashing it didnt check if
route has no paths and delete the same. So route was present with no paths.

Solution:
Re-org the squashing code in walk to go via proper removal of path and
check for zero path to delete route.

Closes-bug: #1562961

Conflicts:
 src/vnsw/agent/test/test_l2route.cc

Conflicts:
 src/vnsw/agent/test/test_l2route.cc

Conflicts:
 src/vnsw/agent/test/test_l2route.cc

Double path delete request in succession.

Problem:
In stale path cleanup, if no stale path was found function used to clean last
path seen. This resulted in not related path getting deleted. In case of OVS
delete the path to be deleted was already gone because of bug and table deleted,
resulting in ceash.

Solution:
Delete if relevant path is found.
Closes-bug: #1571598

Change-Id: I25bd7cec4c0774d0a041286c15af99bc5a2d1ada

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/20090
Submitter: Prabhjot Singh Sethi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/20090
Committed: http://github.org/Juniper/contrail-controller/commit/cbbcd626591790f85b4977287d64a24a7481c313
Submitter: Zuul
Branch: master

commit cbbcd626591790f85b4977287d64a24a7481c313
Author: Manish <email address hidden>
Date: Tue Apr 19 17:29:06 2016 +0530

Double path delete request in succession.

Problem:
In stale path cleanup, if no stale path was found function used to clean last
path seen. This resulted in not related path getting deleted. In case of OVS
delete the path to be deleted was already gone because of bug and table deleted,
resulting in ceash.

Solution:
Delete if relevant path is found.

Closes-bug: 1571598
Change-Id: I25bd7cec4c0774d0a041286c15af99bc5a2d1ada
(cherry picked from commit 4567b9685726123fc41790c68cf4b3e8eea62d34)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Loading subscribers...

Remote bug watches

Bug watches keep track of this bug in other bug trackers.