DCS causes ZooKeeper to break when more MXOSRVRs are started
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
Invalid
|
Medium
|
Tharak Capirala |
Bug Description
ZooKeeper log (Actual IP address has been replaced with <ip_address>):
-----
1:46:30.809 AM INFO org.apache.
Accepted socket connection from /<ip_address>:42030
1:46:30.809 AM INFO org.apache.
Client attempting to establish new session at /<ip_address>:42030
1:46:30.826 AM INFO org.apache.
Established session 0x1486b5eb8b4056f with negotiated timeout 30000 for client /<ip_address>:42030
1:46:30.837 AM WARN org.apache.
Too many connections from /<ip_address> - max is 150
1:46:32.290 AM WARN org.apache.
Too many connections from /<ip_address> - max is 150
1:46:33.766 AM WARN org.apache.
Too many connections from /<ip_address> - max is 150
1:46:35.732 AM WARN org.apache.
Too many connections from /<ip_address> - max is 150
1:46:37.612 AM WARN org.apache.
Too many connections from /<ip_address> - max is 150
1:46:39.601 AM WARN org.apache.
Too many connections from /<ip_address> - max is 150
1:46:41.198 AM WARN org.apache.
Too many connections from /<ip_address> - max is 150
1:46:41.309 AM INFO org.apache.
Processed session termination for sessionid: 0x1486b5eb8b4056f
1:46:41.317 AM INFO org.apache.
Closed socket connection for client /<ip_address>:42030 which had sessionid 0x1486b5eb8b4056f
-----
Less # of MXOSRVRs does not break ZooKeeper:
Step 1:
more /opt/trafodion/
n007 65
Step 2:
/opt/trafodion/
stopping master.
n007: no server to stop because kill -0 of pid 18464 failed with status 1
Step 3:
vi /opt/trafodion/
<update with less # of MXOSRVRs>
more /opt/trafodion/
n007 32
Step 4:
/opt/trafodion/
*** Information about Trafodion/DCS:
-------
Trafodion Build : trafodion-
DCS Build : dcs-0.9.0
select major_version, minor_version from trafodion.
MAJOR_VERSION MINOR_VERSION
-------
Contents from /opt/trafodion/
export DCS_OPTS=
export DCS_MANAGES_
export DCS_USER_
*** Additional Details:
-------
Hadoop Distro : Cloudera CDH 4.5.0
ZooKepper Version : Zookeeper version: 3.4.5-cdh4.5.0--1
Value of maximum Client Connections in ZooKeeper configuration (maxClientCnxns) is set to 150.
>> ulimit -u
100000
>> /usr/sbin/sshd -T | grep -i max
maxauthtries 6
maxsessions 100
clientalivecountmax 3
maxstartups 200:30:200
tags: | added: connectivity-dcs |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in trafodion: | |
assignee: | nobody → Xu Jian (jian-xu5) |
Changed in trafodion: | |
status: | New → In Progress |
Changed in trafodion: | |
importance: | High → Medium |
Changed in trafodion: | |
assignee: | Xu Jian (jian-xu5) → Tharak Capirala (capirala-tharaknath) |
milestone: | none → r1.1 |
Additional details: ------- ------- -------
-------
DCS starts all MXOSRVRs based on the number defined in /opt/trafodion/ trafodion/ dcs-0.9. 0/conf/ servers but the moment connections are made from the 3rd party app, not all connections gets established and few of them goes into “Connecting” state.
Increasing value of maxClientCnxns from Cloudera Manager (Cloudera Manager > ZooKeeper > Configuration > Server Default Group > maxClientCnxns) didn't help.
This issue occurs on a single-node cluster but NOT multi-node cluster.