[ubuntu-havana-R1.10-#34] control-node cored at RoutePathReplicator::DeleteSecondaryPath
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R1.1 |
Fix Released
|
High
|
Prakash Bailkeri | |||
Trunk |
Fix Released
|
High
|
Prakash Bailkeri |
Bug Description
Have upgraded ubuntu havana system from R1.05 to R1.10. Upgrade went through fine.
After few restarts of supervisor-config and control-node process restarts, we are seeing multiple control-node cores.
Can someone look into?
Copied the logs and cores at http://
Setup:
host1 = 'root@10.
host2 = 'root@10.
host3 = 'root@10.
host4 = 'root@10.
host5 = 'root@10.
host6 = 'root@10.
host7 = 'root@10.
host8 = 'root@10.
host9 = 'root@10.
host10 = 'root@10.
env.roledefs = {
'all': [host1, host2, host3, host4, host5, host6, host7, host8, host9, host10],
'cfgm': [host1, host2, host3],
'openstack': [host2],
'webui': [host3],
'control': [host1, host3],
'compute': [host4, host5, host6, host7, host8, host9, host10],
'collector': [host1, host3],
'database': [host1, host2, host3],
'build': [host_build],
}
env.hostnames = {
'all': ['nodei6', 'nodei7', 'nodei8', 'nodei9', 'nodei10', 'nodei16', 'nodei17', 'nodei18', 'nodei19', 'nodei20']
}
Crash-Decode:
Core was generated by `/usr/bin/
Program terminated with signal 6, Aborted.
#0 0x00007ff1cd5da425 in raise () from /lib/x86_
(gdb) bt
#0 0x00007ff1cd5da425 in raise () from /lib/x86_
#1 0x00007ff1cd5ddb8b in abort () from /lib/x86_
#2 0x00007ff1cd5d30ee in ?? () from /lib/x86_
#3 0x00007ff1cd5d3192 in __assert_fail () from /lib/x86_
#4 0x000000000074c593 in RoutePathReplic
at controller/
#5 0x000000000074f905 in RoutePathReplic
at controller/
#6 0x000000000074ffc5 in RoutePathReplic
at controller/
#7 0x00000000009bbe72 in operator() (a1=0x7ff15801f3a0, a0=0x7ff1a010cd20, this=0x7ff197ff
#8 DBTableBase:
#9 0x00000000009bd01a in DBTablePartBase
#10 0x00000000009ba2fb in DBPartition:
#11 0x00000000009fccc0 in TaskImpl::execute (this=0x7ff1640
#12 0x00007ff1ce83cece in ?? () from /usr/lib/
#13 0x00007ff1ce833e0b in ?? () from /usr/lib/
#14 0x00007ff1ce8326f2 in ?? () from /usr/lib/
#15 0x00007ff1ce82d3ce in ?? () from /usr/lib/
#16 0x00007ff1ce82d270 in ?? () from /usr/lib/
#17 0x00007ff1ce384e9a in start_thread () from /lib/x86_
#18 0x00007ff1cd697ccd in clone () from /lib/x86_
#19 0x0000000000000000 in ?? ()
(gdb)
tags: | added: blocker |
information type: | Proprietary → Public |
Changed in juniperopenstack: | |
assignee: | nobody → Prakash Bailkeri (prakashmb) |
status: | New → Fix Committed |
Changed in juniperopenstack: | |
status: | Fix Committed → Fix Released |
Reviewed: https:/ /review. opencontrail. org/3667 github. org/Juniper/ contrail- controller/ commit/ ecbf2b8a1f0a679 03715d716668e62 de326fe8bb
Committed: http://
Submitter: Zuul
Branch: master
commit ecbf2b8a1f0a679 03715d716668e62 de326fe8bb
Author: Prakash Bailkeri <email address hidden>
Date: Sun Oct 12 23:41:07 2014 -0700
Fix concurrency issue in updating DBEntry flags
Fixes bug: #1375226
Cause:
DBEntry flag is updated from two mutually non-exclusive task. This results in corrupting the flags and unexpected free/delete on entry.
The problem is caused while setting "OnRemoveQ" flag on DBEntry as it can be done from any task context.
Fix:
Move OnRemoveQ out of DBEntry flags and make it as atomic bool variable.
Add assert to catch the case where route gets deleted with active paths
Add assert to catch the case where path is inserted to a deleted route
Added new test for repeated route update(from a agent with different nexthop) which discovered this bug.
Change-Id: I5df04f89f00959 799a921e9db3738 8c0eb56334e