R4.0 Build 6 Ubuntu 16.04.2 contrail Container cluster
On this setup, there are 3 controller containers,
During provisioning, it so happened that the first controller got provisioned correctly, and the other two came up few hrs later. So the zk cluster was up once all the 3 containers came up fine.
From the contrail-api logs on the first container , it seems that contrail-api tries a number of times to connect to the zk nodes, but keeps failing.. and after about an hour or so, it stops retrying.
contrail-api should retry connecting to zk forever periodically
May 17 08:38:12 nodec1 contrail-api[3909]: INFO:api-0:Connecting to 10.204.216.59:2181
May 17 08:38:12 nodec1 contrail-api[3909]: WARNING:api-0:Connection dropped: socket connection error: Connection refused
May 17 08:38:12 nodec1 contrail-api[3909]: INFO:api-0:Connecting to 10.204.216.60:2181
May 17 08:38:12 nodec1 contrail-api[3909]: WARNING:api-0:Connection dropped: socket connection error: Connection refused
May 17 08:38:12 nodec1 contrail-api[3909]: INFO:api-0:Connecting to 10.204.216.58:2181
May 17 08:38:12 nodec1 contrail-api[3909]: DEBUG:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=400000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
May 17 08:38:12 nodec1 contrail-api[3909]: WARNING:api-0:Connection dropped: socket connection broken
---------------
May 17 09:52:44 nodec1 contrail-api[3909]: INFO:api-0:Connecting to 10.204.216.59:2181
May 17 09:52:44 nodec1 contrail-api[3909]: WARNING:api-0:Connection dropped: socket connection error: Connection refused
May 17 09:52:44 nodec1 contrail-api[3909]: INFO:api-0:Connecting to 10.204.216.60:2181
May 17 09:52:44 nodec1 contrail-api[3909]: WARNING:api-0:Connection dropped: socket connection error: Connection refused
May 17 09:52:44 nodec1 contrail-api[3909]: INFO:api-0:Connecting to 10.204.216.58:2181
May 17 09:52:44 nodec1 contrail-api[3909]: DEBUG:api-0:Sending request(xid=None): Connect(protocol_version=0, last_zxid_seen=0, time_out=400000, session_id=0, passwd='\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', read_only=None)
May 17 09:52:44 nodec1 contrail-api[3909]: WARNING:api-0:Connection dropped: socket connection broken
May 17 09:52:47 nodec1 contrail-api[3909]: ERROR:contrail-api:Session Event: TCP Connect Fail
May 17 09:52:47 nodec1 contrail-api[3909]: ERROR:contrail-api:SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec1 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 10.204.216.58:2181, 10.204.216.59:2181, 10.204.216.60:2181, ] status = Down description = >>, << type = Collector name = server_addrs = [ 10.204.216.58:8086, ] status = Initializing description = Idle to Connect on EvIdleHoldTimerExpired >>, ] description = Zookeeper:Zookeeper[], Collector connection down >>, ] >>
May 17 09:52:47 nodec1 contrail-api[3909]: ERROR:contrail-api:SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec1 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 10.204.216.58:2181, 10.204.216.59:2181, 10.204.216.60:2181, ] status = Down description = >>, << type = Collector name = server_addrs = [ 10.204.216.58:8086, ] status = Down description = Connect to Idle on EvTcpConnectFail >>, ] description = Zookeeper:Zookeeper[], Collector connection down >>, ] >>
May 17 09:52:47 nodec1 contrail-api[3909]: ERROR:contrail-api:SANDESH: [DROP: WrongClientSMState] SandeshModuleClientTrace: data = << name = nodec1:Config:contrail-api:0 client_info = << status = Idle successful_connections = 0 pid = 3909 http_port = 8084 start_time = 1495010291999275 collector_name = collector_ip = 10.204.216.58:8086 collector_list = [ 10.204.216.60:8086, 10.204.216.58:8086, 10.204.216.59:8086, ] >> sm_queue_count = 1 max_sm_queue_count = 3 >>
May 17 09:52:49 nodec1 contrail-api[3909]: WARNING:api-0:Failed connecting to Zookeeper within the connection retry policy.
May 17 09:52:49 nodec1 contrail-api[3909]: INFO:api-0:Zookeeper session lost, state: CLOSED
May 17 09:52:51 nodec1 contrail-api[3909]: ERROR:contrail-api:Session Event: TCP Connect Fail
May 17 09:52:51 nodec1 contrail-api[3909]: ERROR:contrail-api:SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec1 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 10.204.216.58:2181, 10.204.216.59:2181, 10.204.216.60:2181, ] status = Down description = >>, << type = Collector name = server_addrs = [ 10.204.216.59:8086, ] status = Initializing description = Idle to Connect on EvIdleHoldTimerExpired >>, ] description = Zookeeper:Zookeeper[], Collector connection down >>, ] >>
May 17 09:52:51 nodec1 contrail-api[3909]: ERROR:contrail-api:SANDESH: [DROP: WrongClientSMState] NodeStatusUVE: data = << name = nodec1 process_status = [ << module_id = contrail-api instance_id = 0 state = Non-Functional connection_infos = [ << type = Zookeeper name = Zookeeper server_addrs = [ 10.204.216.58:2181, 10.204.216.59:2181, 10.204.216.60:2181, ] status = Down description = >>, << type = Collector name = server_addrs = [ 10.204.216.59:8086, ] status = Down description = Connect to Idle on EvTcpConnectFail >>, ] description = Zookeeper:Zookeeper[], Collector connection down >>, ] >>
Review in progress for https:/ /review. opencontrail. org/34433
Submitter: Sachin Bansal (<email address hidden>)