Failure in bgp_config_test when many instances are run in parallel

Bug #1334063 reported by Nischal Sheth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Fix Released
Medium
Nischal Sheth

Bug Description

When a few instances of the bgp_config_test are run in parallel, some of
them fail during TearDown with the following backtrace.

(gdb) bt
#0 0x00000000011ab789 in testing::UnitTest::AddTestPartResult (this=0x18b4620, result_type=testing::TestPartResult::kNonFatalFailure, file_name=0x131df40 "controller/src/bgp/test/bgp_config_test.cc", line_number=60, message=..., os_stack_trace=...)
    at third_party/gtest-1.6.0/src/gtest.cc:3795
#1 0x00000000011a290d in testing::internal::AssertHelper::operator= (this=0x7fff6a25b3f0, message=...) at third_party/gtest-1.6.0/src/gtest.cc:359
#2 0x0000000000ca4d29 in BgpConfigTest::TearDown (this=0x3014e20) at controller/src/bgp/test/bgp_config_test.cc:60
#3 0x00000000011bea9d in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x3014e20, method=&virtual testing::Test::TearDown(), location=0x13cebb9 "TearDown()") at third_party/gtest-1.6.0/src/gtest.cc:2090
#4 0x00000000011b9df0 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x3014e20, method=&virtual testing::Test::TearDown(), location=0x13cebb9 "TearDown()") at third_party/gtest-1.6.0/src/gtest.cc:2126
#5 0x00000000011a7201 in testing::Test::Run (this=0x3014e20) at third_party/gtest-1.6.0/src/gtest.cc:2170
#6 0x00000000011a7914 in testing::TestInfo::Run (this=0x2fcb5b0) at third_party/gtest-1.6.0/src/gtest.cc:2338
#7 0x00000000011a7ebb in testing::TestCase::Run (this=0x2fcaba0) at third_party/gtest-1.6.0/src/gtest.cc:2445
#8 0x00000000011acb9a in testing::internal::UnitTestImpl::RunAllTests (this=0x2fca8a0) at third_party/gtest-1.6.0/src/gtest.cc:4237
#9 0x00000000011bfb63 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2fca8a0, method=
    (bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x11ac928 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x13cf6e8 "auxiliary test code (environments or event listeners)") at third_party/gtest-1.6.0/src/gtest.cc:2090
#10 0x00000000011bab4e in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x2fca8a0, method=
    (bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x11ac928 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x13cf6e8 "auxiliary test code (environments or event listeners)") at third_party/gtest-1.6.0/src/gtest.cc:2126
#11 0x00000000011ab94a in testing::UnitTest::Run (this=0x18b4620) at third_party/gtest-1.6.0/src/gtest.cc:3874
#12 0x0000000000ca2e64 in main (argc=1, argv=0x7fff6a25b978) at controller/src/bgp/test/bgp_config_test.cc:1078

The routing instance manager still has instances:

(gdb) f 2
#2 0x0000000000ca4d29 in BgpConfigTest::TearDown (this=0x3014e20) at controller/src/bgp/test/bgp_config_test.cc:60
60 TASK_UTIL_ASSERT_EQ(0, server_.routing_instance_mgr()->count());
(gdb)

Specifically the default instance:

(gdb) p server_.inst_mgr_
$45 = (boost::scoped_ptr<RoutingInstanceMgr>) 0x30165a0
(gdb) p (RoutingInstanceMgr *) 0x30165a0
$46 = (RoutingInstanceMgr *) 0x30165a0
(gdb) p $46->instances_
$47 = {
  bits_ = {
    <BitSet> = {
      static npos = 18446744073709551615,
      blocks_ = std::vector of length 1, capacity 1 = {1}
    }, <No data fields>},
  values_ = std::vector of length 1, capacity 2 = {0x302c170},
  map_ = std::map with 1 elements = {
    ["default-domain:default-project:ip-fabric:__default__"] = 0x302c170
  }
}

The default instance still has 3 tables:

(gdb) p (RoutingInstance *) 0x302c170
$48 = (RoutingInstance *) 0x302c170
(gdb) p $48->vrf_table_
$49 = std::map with 3 elements = {
  ["bgp.ermvpn.0"] = 0x302d790,
  ["bgp.evpn.0"] = 0x302eaf0,
  ["bgp.l3vpn.0"] = 0x302c3a0
}

Since these are all VPN tables the issue is likely in the RoutePathReplicator.

(gdb) p server_

[snip]

  inetvpn_replicator_ = (boost::scoped_ptr<RoutePathReplicator>) 0x3017eb0,
  ermvpn_replicator_ = (boost::scoped_ptr<RoutePathReplicator>) 0x3019120,
  evpn_replicator_ = (boost::scoped_ptr<RoutePathReplicator>) 0x30192e0,

[snip]

(gdb) p (RoutePathReplicator *) 0x3017eb0
$51 = (RoutePathReplicator *) 0x3017eb0
(gdb) p *$51
$52 = (RoutePathReplicator) {
  _vptr.RoutePathReplicator = 0x13d7470,
  mutex_ = {
    static is_rw_mutex = false,
    static is_recursive_mutex = false,
    static is_fair_mutex = false,
    impl = {
      __data = {
        __lock = 0,
        __count = 0,
        __owner = 0,
        __nusers = 0,
        __kind = 0,
        __spins = 0,
        __list = {
          __prev = 0x0,
          __next = 0x0
        }
      },
      __size = '\000' <repeats 39 times>,
      __align = 0
    }
  },
  table_state_ = std::map with 1 elements = {
    [0x302c3a0] = 0x7f7c1c005600
  },
  bulk_sync_ = std::map with 0 elements,
  unreg_table_list_ = std::set with 0 elements,
  server_ = 0x3014e48,
  family_ = Address::INETVPN,
  walk_trigger_ = (boost::scoped_ptr<TaskTrigger>) 0x3017fb0,
  unreg_trigger_ = (boost::scoped_ptr<TaskTrigger>) 0x3017ff0,
  trace_buf_ = (boost::shared_ptr<TraceBuffer<SandeshTrace> >) (count 3, weak count 2) 0x3018070
}

We still have TableState for bgp.l3vpn.0 in the inet replicator:

(gdb) p (BgpTable *) 0x302c3a0
$55 = (InetVpnTable *) 0x302c3a0
(gdb) p $55->name_
$56 = "bgp.l3vpn.0"
(gdb)

Nischal Sheth (nsheth)
description: updated
Revision history for this message
Nischal Sheth (nsheth) wrote :
Changed in juniperopenstack:
status: New → Fix Committed
Nischal Sheth (nsheth)
Changed in juniperopenstack:
status: Fix Committed → Fix Released
Nischal Sheth (nsheth)
information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.