Control node assertion in RibOutUpdates::PeerDequeue in scaled setup
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R1.1 |
Fix Committed
|
High
|
Nischal Sheth | |||
R2.0 |
Fix Released
|
High
|
Nischal Sheth | |||
Trunk |
Fix Released
|
High
|
Nischal Sheth |
Bug Description
Release 1.10 build 44.
Happened in Harshad's scale setup which has 1000 vRouters with 3 CNs.
Problem seems to happen multiple times when the setup is initializing.
Backtrace:
(gdb) bt
#0 0x00007f92609a0425 in raise () from /lib/x86_
#1 0x00007f92609a3b8b in abort () from /lib/x86_
#2 0x00007f92609990ee in ?? () from /lib/x86_
#3 0x00007f9260999192 in __assert_fail () from /lib/x86_
#4 0x0000000000616723 in RibOutUpdates:
#5 0x00000000006717ef in SchedulingGroup
#6 0x0000000000671ab3 in SchedulingGroup
#7 0x00000000006761fd in SchedulingGroup
#8 0x00000000009fccc0 in TaskImpl::execute (this=0x7f92500
#9 0x00007f9261c02ece in ?? () from /usr/lib/
#10 0x00007f9261bf9e0b in ?? () from /usr/lib/
#11 0x00007f9261bf86f2 in ?? () from /usr/lib/
#12 0x00007f9261bf33ce in ?? () from /usr/lib/
#13 0x00007f9261bf3270 in ?? () from /usr/lib/
#14 0x00007f926174ae9a in start_thread () from /lib/x86_
#15 0x00007f9260a5dccd in clone () from /lib/x86_
#16 0x0000000000000000 in ?? ()
information type: | Proprietary → Public |
Changed in juniperopenstack: | |
status: | New → In Progress |
Reviewed: https:/ /review. opencontrail. org/5576 github. org/Juniper/ contrail- controller/ commit/ e12740b76962930 457bc55a948fc9a f5de994a1a
Committed: http://
Submitter: Zuul
Branch: R1.10
commit e12740b76962930 457bc55a948fc9a f5de994a1a
Author: Nischal Sheth <email address hidden>
Date: Thu Dec 11 13:58:44 2014 -0800
Fix corner case in SchedulingGroup ::UpdatePeerQue ue logic
An assertion fails if a peer gets blocked when dequeueing updates from ::UpdatePeer.
multiple RibOuts via SchedulingGroup
Problem happens in the following situation:
- Peer was previously blocked and now has updates to send for 2 RibOuts. :PeerDequeue) :PeerDequeue keeps going. :PeerDequeue returns true because of the previous point
- Updates for both RibOuts are for the same queue i.e. QBULK or QUPDATE.
- Peer shares a marker for the for the first RibOut with another peer or
peer's marker gets merged with marker for another peer when sending
updates for first RibOut (via RibOutUpdates:
- There are still more updates to be sent for the first RibOut i.e. the
processing in RibOutUpdates:
- Original peer gets send blocked, but we manage to dequeue all updates
for the first RibOut to the other peer with which the original peer's
marker got merged.
- RibOutUpdates:
At this point, we continue and try to dequeue updates for the 2nd RibOut :PeerDequeue returned success. We hit an assertion :PeerDequeue when called for the 2nd RibOut because the
because RibOutUpdates:
in RibOutUpdates:
original peer is not in the send ready set anymore.
Fix is to stop processing RibOuts for the peer if it's send blocked when :PeerDequeue returns. This ensures that we don't hit the
RibOutUpdates:
assertion since we don't try to process the 2nd RibOut. Updates for the
2nd RibOut will be sent to the other peer when it's WorkPeer item gets
processed.
Change-Id: Ib1ef218ad9eecb 1ca489b3045bdc3 419e75caa21
Closes-Bug: 1386460