BUM tree collapsed with tor-agent stop
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R3.1 |
New
|
High
|
kalagesan | |||
R3.2 |
New
|
High
|
kalagesan | |||
R4.0 |
Invalid
|
High
|
kalagesan | |||
Trunk |
Invalid
|
High
|
kalagesan |
Bug Description
* version
contrail v3.1.1.0-45
QFX4 v14.1X53-D33
QFX11 v14.1X53-D33.2
* tor-agent and QFX
QFX4(172.23.11.45): tor-agent-4
QFX11(172.
* topology diagram
[IXIA]=
customer configured 500 virtual-network on Contrail.
customer applied the traffic to each virtual-network and tested the failure of TSN node.
when our customer stop of tor-agent process by one TSN node(172.
communication disconnection occurred on 76/500VN.
The customer picked up from the communication disconnected VN and confirmed the BUM Tree.
The BUM Tree of the picked up VN was broken.
* BUM Tree status
[The BUM Tree status before stop tor-agent]
via TSN(172.23.10.196)
QFX4---
QFX4<--
*500VN
[stop tor-agent in TSN(172.23.10.196)]
-----
root@openc-14:~# date
Thu Dec 15 16:42:03 JST 2016
root@openc-14:~# service contrail-
date
contrail-
root@openc-14:~# date
Thu Dec 15 16:42:04 JST 2016
-----
[Expected behavior]
Change BUM Tree
via TSN(172.
QFX4---
QFX4<--
*500VN
[Pickup VN1]
It is broken BUMTree.
Check TSN(172.23.10.197)
root@openc-15:~# vxlan --get 524
VXLAN Table
VNID NextHop
----------------
524 640
root@openc-15:~# nh --get 640
Id:640 Type:Vrf_Translate Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:1978
Flags:Valid, Vxlan, Unicast Flood,
Vrf:1978
root@openc-15:~# rt --family bridge --dump 1978
Flags: L=Label Valid, Df=DHCP flood
vRouter bridge table 0/1978
Index DestMac Flags Label/VNID Nexthop
51356 90:1b:e:52:bb:ea Df - 3
296020 30:20:0:8:0:3 - 1
367936 ff:ff:ff:ff:ff:ff LDf 524 13327
909752 30:20:0:8:0:4 - 1
974164 30:48:4:0:0:8 LDf 524 16
root@openc-15:~# nh --get 13327
Id:13327 Type:Composite Fmly:AF_BRIDGE Rid:0 Ref_cnt:3 Vrf:1978
Flags:Valid, Multicast, L2,
Sub NH(label): 13318(0) 13040(0)
Id:13318 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:1978
Flags:Valid, Tor,
Sub NH(label): 16(524)
Id:16 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:41360 Vrf:0
Flags:Valid, Vxlan,
Oif:0 Len:14 Flags Valid, Vxlan, Data:5c 5e ab 03 57 f0 90 1b 0e 52 bb ea 08 00
Vrf:0 Sip:172.23.10.197 Dip:172.23.11.45 <<<<<<<
Id:13040 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:1978
Flags:Valid, Evpn,
Sub NH(label):
There is a path of QFX4.
However,there is no path to other TSN node(172.
[Pickup VN2]
It is normal status.
Check TSN(172.23.10.197)
root@openc-15:~# vxlan --get 518
VXLAN Table
VNID NextHop
----------------
518 1302
root@openc-15:~# nh --get 1302
Id:1302 Type:Vrf_Translate Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:2448
Flags:Valid, Vxlan, Unicast Flood,
Vrf:2448
root@openc-15:~# rt --family bridge --dump 2448
Flags: L=Label Valid, Df=DHCP flood
vRouter bridge table 0/2448
Index DestMac Flags Label/VNID Nexthop
59857 30:20:0:2:0:4 - 1
249460 30:48:4:0:0:2 LDf 518 16
344168 30:48:8:0:0:2 LDf 518 17
446440 30:20:0:2:0:3 - 1
522112 ff:ff:ff:ff:ff:ff LDf 518 12521
722868 90:1b:e:52:bb:ea Df - 3
root@openc-15:~# nh --get 12521
Id:12521 Type:Composite Fmly:AF_BRIDGE Rid:0 Ref_cnt:4 Vrf:2448
Flags:Valid, Multicast, L2,
Sub NH(label): 10434(0) 12254(0) 4507(0)
Id:10434 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:2448
Flags:Valid, Tor,
Sub NH(label): 16(518)
Id:16 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:41360 Vrf:0
Flags:Valid, Vxlan,
Oif:0 Len:14 Flags Valid, Vxlan, Data:5c 5e ab 03 57 f0 90 1b 0e 52 bb ea 08 00
Vrf:0 Sip:172.23.10.197 Dip:172.23.11.45 <<<<<<<
Id:12254 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:2448
Flags:Valid, Evpn,
Sub NH(label):
Id:4507 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:2448
Flags:Valid, Fabric,
Sub NH(label): 1584(191914)
Id:1584 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:2047 Vrf:0
Flags:Valid, MPLSoGRE,
Oif:0 Len:14 Flags Valid, MPLSoGRE, Data:90 1b 0e 44 2c b5 90 1b 0e 52 bb ea 08 00
Vrf:0 Sip:172.23.10.197 Dip:172.23.10.196 <<<<<<<
Check TSN(172.23.10.196)
root@openc-14:~# vxlan --get 518
VXLAN Table
VNID NextHop
----------------
518 12002
root@openc-14:~# nh --get 12002
Id:12002 Type:Vrf_Translate Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:483
Flags:Valid, Vxlan, Unicast Flood,
Vrf:483
root@openc-14:~# rt --family bridge --dump 483
Flags: L=Label Valid, Df=DHCP flood
vRouter bridge table 0/483
Index DestMac Flags Label/VNID Nexthop
25892 90:1b:e:44:2c:b5 Df - 3
66856 30:48:4:0:0:2 LDf 518 15
153520 ff:ff:ff:ff:ff:ff LDf 518 17808
289320 30:20:0:2:0:4 - 1
315000 30:20:0:2:0:3 - 1
752656 0:0:5e:0:1:0 Df - 3
785920 30:48:8:0:0:2 LDf 518 16
root@openc-14:~# nh --get 17808
Id:17808 Type:Composite Fmly:AF_BRIDGE Rid:0 Ref_cnt:4 Vrf:483
Flags:Valid, Multicast, L2,
Sub NH(label): 17803(0) 13643(0) 8469(0)
Id:17803 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:483
Flags:Valid, Tor,
Sub NH(label): 16(518)
Id:16 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:1506 Vrf:0
Flags:Valid, Vxlan,
Oif:0 Len:14 Flags Valid, Vxlan, Data:5c 5e ab 03 57 f0 90 1b 0e 44 2c b5 08 00
Vrf:0 Sip:172.23.10.196 Dip:172.23.11.37 <<<<<<<
Id:13643 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:483
Flags:Valid, Evpn,
Sub NH(label):
Id:8469 Type:Composite Fmly: AF_INET Rid:0 Ref_cnt:2 Vrf:483
Flags:Valid, Fabric,
Sub NH(label): 1555(190816)
Id:1555 Type:Tunnel Fmly: AF_INET Rid:0 Ref_cnt:2047 Vrf:0
Flags:Valid, MPLSoGRE,
Oif:0 Len:14 Flags Valid, MPLSoGRE, Data:90 1b 0e 52 bb ea 90 1b 0e 44 2c b5 08 00
Vrf:0 Sip:172.23.10.196 Dip:172.23.10.197 <<<<<<<
Customer tried recover of this issue.
Recorvery step is below.
1. Stop and start TSN(172.
-> Not recover. The BUM Tree is broken yet.
2. Restart TSN(172.
-> This issue recoverd. The BUM Tree is normal.
customer like to understand root cause about this issue from the logs provided
Below files are collected from customer and its available in my local log server:
root@10.219.48.123, pwd:Jtaclab123
fileupload directory path:/home/
Contrail:
testbed.py
under /var/log/contrail logs.
gcore file of tor-agent process.
QFX:
RSI
under /var/log archive
VN information on communication disconnection
tags: | added: qfx |
tags: | added: vrouter |
description: | updated |
tags: | added: nttc |
tags: | added: vrouter3.1-45 |
information type: | Proprietary → Public |
Hi Manish,
1. Topology file before and after issue is attached as requested.
2. Also what was the multicast replicator shown on QFX4 and QFX11 for working and non-working VN mentioned in bug. It is shown under ovsdb mac-table. e91fbe02- 9a5d-457c- 91f2-70693558f8 10 e91fbe02- 9a5d-457c- 91f2-70693558f8 10
gcv@QFX4> show ovsdb mac logical-switch Contrail-
Logical Switch Name: Contrail-
Mac IP Encapsulation Vtep
Address Address Address
ff:ff:ff:ff:ff:ff 0.0.0.0 Vxlan over Ipv4 172.23.11.45
30:48:04:00:00:01 0.0.0.0 Vxlan over Ipv4 172.23.11.45
30:48:08:00:00:01 0.0.0.0 Vxlan over Ipv4 172.23.11.37
ff:ff:ff:ff:ff:ff 0.0.0.0 Vxlan over Ipv4 172.23.10.197
gcv@QFX4> show ethernet-switching table vlan-name Contrail- e91fbe02- 9a5d-457c- 91f2-70693558f8 10
MAC flags (S - static MAC, D - dynamic MAC, L - locally learned, P - Persistent static
SE - statistics enabled, NM - non configured MAC, R - remote PE MAC, O - ovsdb MAC)
Ethernet switching table : 2 entries, 2 learned e91fbe02- 9a5d-457c- 91f2-70693558f8 10 30:48:04:00:00:01 D - ae5.3001 e91fbe02- 9a5d-457c- 91f2-70693558f8 10 30:48:08:00:00:01 DO - vtep.32771
Routing instance : default-switch
Vlan MAC MAC Age Logical
name address flags interface
Contrail-
Contrail-
monitor traffic interface result
xe-1/0/44 Up 51550636 (0) 51483032 (0)
xe-1/0/45 Up 51483032 (0) 51550636 (0)
xe-1/0/46 Up 526334340 (0) 525342139 (0)
xe-1/0/47 Up 626666 (2) 717271 (3)
et-1/0/48 Up 1464749420 (1) 1946185315 (0)
et-1/0/49 Up 1478339961 (0) 1942244611 (0)
xe-1/0/50:0 Up 299845 (1) 126177869 (0)
xe-1/0/50:1 Up 0 (0) 0 (0)
xe-1/0/50:2 Up 439525 (0) 133091177 (1000) <<<<<<< to control plan (to TSN)
It seems that mac is learning correctly and QFX transefer traffic to TSN.
However,when this issue occured, Non-working VN(broken BUM Tree) could not Multicast communication.
Regards,
Kannan