Comment 4 for bug 1719236

Revision history for this message
Sundaresan Rajangam (srajanga) wrote : Re: [Bug 1719236] Contrail analytics response time varies based on the number of VN/VMI when one of the control node fails

Hi Vijay,

Can you please confirm the ubuntu version?
UVE aggregation doesn’t use kafka if the ubuntu version is 12.X
So, I need to know the ubuntu version to look at the right code path.

Thanks,
Sundar
> On Oct 4, 2017, at 8:24 PM, vijaya kumar shankaran <email address hidden> wrote:
>
> Hi,
>
> Any Update?
>
> Best Regards,
> Vijay Kumar
>
> --
> You received this bug notification because you are a member of Contrail
> Systems engineering, which is subscribed to Juniper Openstack.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_bugs_1719236&d=DwIFaQ&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=LPHaOrEhcHUkaXTIgszI3jGHWJ2DkgIMvg2FajOezdI&m=hTKGHt_jLdtJnSnpw5RdXV3SE5w0_3VAg1LLTf7jFjo&s=vuECLcAMoAuQ6Iru8-E9Zcps8Bmg82Dw53EUy7hwJK8&e=
>
> Title:
> Contrail analytics response time varies based on the number of VN/VMI
> when one of the control node fails
>
> Status in Juniper Openstack:
> New
>
> Bug description:
> Customer is testing analytics response time when one of the control
> node fails. Response time varies based on the number of VN and VMI’s.
> Greater the number VN’s & VMI it takes longer for the response.
>
> Customer setup is ass below
> 3 Control, config
> 3 collector
> 3 DB
> 1 openstack
> 6 compute nodes
> 2 TSN nodes
>
>
> /etc/contrail/contrail-vrouter-agent.conf is modified to point to collector nodes on each compute node.
> Customer has provided scripts to create VN & VMI and to query the analytics. They shutdown one of the control node and note down the time. They see a large difference in correct response for the analytics queries based on the number of VN’s and VMI
> VN VMI Response time
> 303 600 5 Sec
> 1500 3000 50 secs
> 3000 6000 approx 2 min With one control node shutdown
>
> The above delta time doubles when two control nodes are shutdown.
> Is this intended behavior?
> Why is this difference noticed in clustered scenario when collector nodes stop responding (to replicate the nodes are shutdown). The DB nodes are all up and running when performing this test.
> Can the response time be reduced & consistent irrespective number of interfaces.
>
> I could replicate the issue in lab up to 1500 VN and 3000 VMI. Due to
> resource constraints couldn’t scale this higher.
>
> When querying fro VMI we were getting Http 200 K as response but
> nothing pertaining to interface or network (output of script)
>
> Valid response
> 10.204.74.242:8081 default-domain:mock:vmi_ntt-comp5_0100_01 200 {"UveVMInterfaceAgent": {"ip6_active": f
>
> Invalid response
> 10.204.74.242:8081 default-domain:mock:vmi_ntt-comp6_0100_01 200 {}
>
>
> From LogsFrom contrail-alarm-gen.log sv-25_log-large_sv-24_down
>
> 09/05/2017 11:46:02 AM [contrail-alarm-gen]: -uve-3 An exception of
> type LeaderNotAvailableError occured. Arguments:
>
> LeaderNotAvailableError: LeaderNotAvailableError - 5 - This error is
> thrown if we are in the middle of a leadership election and there is
> currently no leader for this partition and hence it is unavailable for
> writes.
>
> 09/05/2017 11:46:07 AM [contrail-alarm-gen]: -uve-23 An exception of type LeaderNotAvailableError occured. Arguments:
> LeaderNotAvailableError: LeaderNotAvailableError - 5 - This error is thrown if we are in the middle of a leadership election and there is currently no leader for this partition and hence it is unavailable for writes.
>
> 09/05/2017 11:46:07 AM [contrail-alarm-gen]: Error: Consumer Failure LeaderNotAvailableError occured. Arguments:
> LeaderNotAvailableError: LeaderNotAvailableError - 5 - This error is thrown if we are in the middle of a leadership election and there is currently no leader for this partition and hence it is unavailable for writ
>
> 09/05/2017 11:51:00 AM [contrail-alarm-gen]: redis-uve failed Error connecting to 192.168.0.124:6379. timed out. for key ObjectVRouter:sv-39: (u'192.168.0.124', 6379, 1149) tb Traceback (most recent call last):
> ConnectionError: Error connecting to 192.168.0.124:6379. timed out.
>
>
> 09/05/2017 11:51:00 AM [contrail-alarm-gen]: redis-uve failed Error connecting to 192.168.0.124:6379. timed out. for key ObjectGeneratorInfo:sv-21:Control:contrail-dns:0: (u'192.168.0.124', 6379, 1149) tb Traceback (most recent call last):
>
> ConnectionError: Error connecting to 192.168.0.124:6379. timed out.
>
> 09/05/2017 11:51:00 AM [contrail-alarm-gen]: Exception KeyError in notif worker. Arguments:
> ((u'192.168.0.124', 6379, 1149),) : traceback Traceback (most recent call last):
>
>
> 09/05/2017 11:51:04 AM [contrail-alarm-gen]: -uve-12 An exception of type KeyError occured. Arguments:
>
>
> 09/05/2017 11:51:40 AM [contrail-alarm-gen]: Starting part 2
> collectors [u'192.168.0.126:6379', u'192.168.0.125:6379']
>
> To manage notifications about this bug go to:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_juniperopenstack_-2Bbug_1719236_-2Bsubscriptions&d=DwIFaQ&c=HAkYuh63rsuhr6Scbfh0UjBXeMK-ndb3voDTXcWzoCI&r=LPHaOrEhcHUkaXTIgszI3jGHWJ2DkgIMvg2FajOezdI&m=hTKGHt_jLdtJnSnpw5RdXV3SE5w0_3VAg1LLTf7jFjo&s=Sc6qHGHmHO6jxZXFet4GwQWQMIJKh9AxZ4g3wZcELyk&e=