On scaled setups with compaction backed up, it is observed that cassandra does not come up/goes down with backtrace suggesting Too many files are open. The cassandra user limits are set to 100000 but the global system wide limit is only 65K.
sysctl -a | grep file-max
fs.file-max = 65535
root@nodei28:/var/log/cassandra#
On database nodes, we need to increase the global system wide file descriptor limit to be more than 100K - say to 200K as part of provisioning and SM.
INFO [main] 2016-05-17 12:22:32,392 CassandraDaemon.java:643 - No gossip backlog; proceeding
INFO [main] 2016-05-17 12:22:32,487 Server.java:155 - Netty using native Epoll event loop
INFO [main] 2016-05-17 12:22:32,534 Server.java:193 - Using Netty Version: [netty-buffer=netty-buffer-4.0.23.Final.208198c, netty-codec=netty-codec-4.0.23.Final.208198c, netty-codec-http=netty-codec-http-4.0.23.Final.208198c, netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c, netty-common=netty-common-4.0.23.Final.208198c, netty-handler=netty-handler-4.0.23.Final.208198c, netty-transport=netty-transport-4.0.23.Final.208198c, netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c, netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c, netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c]
INFO [main] 2016-05-17 12:22:32,534 Server.java:194 - Starting listening for CQL clients on /192.168.1.3:9042...
INFO [main] 2016-05-17 12:22:32,610 ThriftServer.java:119 - Binding thrift service to /192.168.1.3:9160
INFO [Thread-12] 2016-05-17 12:22:32,618 ThriftServer.java:136 - Listening for thrift clients...
WARN [Thread-12] 2016-05-17 12:23:43,695 CustomTThreadPoolServer.java:122 - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files in system
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:108) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:110) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:137) [apache-cassandra-2.1.13.jar:2.1.13]
Caused by: java.net.SocketException: Too many open files in system
at java.net.PlainSocketImpl.socketAccept(Native Method) ~[na:1.7.0_95]
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) ~[na:1.7.0_95]
at java.net.ServerSocket.implAccept(ServerSocket.java:530) ~[na:1.7.0_95]
at java.net.ServerSocket.accept(ServerSocket.java:498) ~[na:1.7.0_95]
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:102) ~[apache-cassandra-2.1.13.jar:2.1.13]
... 4 common frames omitted
WARN [Thread-12] 2016-05-17 12:23:43,696 CustomTThreadPoolServer.java:122 - Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files in system
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:108) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:36) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:60) ~[libthrift-0.9.2.jar:0.9.2]
at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:110) ~[apache-cassandra-2.1.13.jar:2.1.13]
at org.apache.cassandra.thrift.ThriftServer$ThriftServerThread.run(ThriftServer.java:137) [apache-cassandra-2.1.13.jar:2.1.13]
Caused by: java.net.SocketException: Too many open files in system
at java.net.PlainSocketImpl.socketAccept(Native Method) ~[na:1.7.0_95]
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) ~[na:1.7.0_95]
at java.net.ServerSocket.implAccept(ServerSocket.java:530) ~[na:1.7.0_95]
at java.net.ServerSocket.accept(ServerSocket.java:498) ~[na:1.7.0_95]
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl(TCustomServerSocket.java:102) ~[apache-cassandra-2.1.13.jar:2.1.13]
... 4 common frames omitted