I have attached other logs collected from customer CAN nodes. These logs were uploaded to CSO JIRA bug and are missing in launchpad.
As per logs it seems more space consumed by active containers compare to old images . So we can rule out issues mentioned in github forums about stale or old images.
root@anc1-prd1-csp-can-01:/var/lib/docker# cd aufs/
root@anc1-prd1-csp-can-01:/var/lib/docker/aufs# ls
diff layers mnt
root@anc1-prd1-csp-can-01:/var/lib/docker/aufs/diff# du -sh *
12K 00a0206078b9c8a520e4d2ebd82ec1033aa4c91001f87fe9b165a46aeee71f83
20K 00cf9f653764931f31c55cc3f0a041721fe646e1a0f56c9fb1c696309d4b8709
As discussed in other emails thread we need answers for below from analytic perspective . Based on above logs it should be 120G for analytic containers. So we need more answers from analytic team. Is there any auto purge mechanism in place ??
From: Himanshu Bahukhandi
Sent: Monday, March 5, 2018 9:06 AM
To: Prashanth Nageshappa <email address hidden>; Soumit Mishra <email address hidden>; Santosh Gupta <email address hidden>; Jeba Paulaiyan <email address hidden>; Kamlesh Parmar <email address hidden>; Ritam Gangopadhyay <email address hidden>; Contrail Systems Analytics Team <email address hidden>
Cc: Rudra Rugge <email address hidden>; Abhay Joshi <email address hidden>; Viswanath KJ <email address hidden>; Srinivasan Dhamotharan <email address hidden>; Ramakantha Guthi <email address hidden>
Subject: Re: CAN production server reached 100% disk space
Hello,
We can rule out 1 and 2 for now as customer moved to 3.2.1 and doesn't see a lot of zookeeper snapshots. Also, there are no analytics core in the new CSO build that they are running. We still need to focus on the analytics table size that grows with time. There is a customer call tomorrow where they will be asking about this issue.
1. Zookeeper snapshot files in controller container (~38G)
/var/lib/zookeeper/version-2/
2. Contrail Collector process crash files in the analytics container (~80G)
/var/crashes/core.contrail-collec.*
3. Cassandra database table in the analyticsdb container (~80G)
/var/lib/cassandra/data/ContrailAnalyticsCql/statstablebystrtagv3-3028f790891111e78c7b0f519d1da8b1
Hi Sundar,
I have attached other logs collected from customer CAN nodes. These logs were uploaded to CSO JIRA bug and are missing in launchpad.
As per logs it seems more space consumed by active containers compare to old images . So we can rule out issues mentioned in github forums about stale or old images.
root@anc1- prd1-csp- can-01: /var/lib/ docker# cd aufs/ prd1-csp- can-01: /var/lib/ docker/ aufs# ls
root@anc1-
diff layers mnt
root@anc1- prd1-csp- can-01: /var/lib/ docker/ aufs/diff# du -sh * 520e4d2ebd82ec1 033aa4c91001f87 fe9b165a46aeee7 1f83 f31c55cc3f0a041 721fe646e1a0f56 c9fb1c696309d4b 8709
12K 00a0206078b9c8a
20K 00cf9f653764931
.. 28c7bc706fc3cfb 1b506830de770fc a2a4d9a7feb09ba 104c 28c7bc706fc3cfb 1b506830de770fc a2a4d9a7feb09ba 104c-init
..
24K 12b85e14145df73
20K 12b85e14145df73
...
1.3G 56fde5f11d9656d 27c25d3f1f2d15d a30267240f4488e 95e113bda26d71e c17f 27c25d3f1f2d15d a30267240f4488e 95e113bda26d71e c17f-init f27bfdecd91c2ca 0bbcc0e25eda5d4 1e3acf75daba8b9 257c f27bfdecd91c2ca 0bbcc0e25eda5d4 1e3acf75daba8b9 257c-init d9c4f5fe9392fd4 e9c3db83be41b48 1336e68e0b08e89 1531 d9c4f5fe9392fd4 e9c3db83be41b48 1336e68e0b08e89 1531-init
24K 56fde5f11d9656d
..
37G 6e31524d1bedfc2
24K 6e31524d1bedfc2
..
86G bdeeb2089d2e519
24K bdeeb2089d2e519
CAN-2 node :
root@anc1- prd1-csp- can-02: /var/lib/ docker/ aufs# du -sh *
119G diff
224K layers
119G mnt
root@anc1- prd1-csp- can-02: /var/lib/ docker/ aufs/mnt# du -sh * 66523735df40f8b 1dffe5afc62485f 9b526334c6c8e9f ca4c 6b51825278c37e4 82e75cff7fd71ce 3f4cc1eece020ce a018
4.0K 0560635be9b7765
4.0K 074953fd509f9dd
.. d21a22920579306 99c0aef7e318eda ea9fbfbe261fe80 fe45 d21a22920579306 99c0aef7e318eda ea9fbfbe261fe80 fe45-init 6e909831a7960d2 cf4f197969d645e 6e84de7f8507c56 e772 6e909831a7960d2 cf4f197969d645e 6e84de7f8507c56 e772-init bc46c24d4a2bc07 e5f6784191cd7a5 7ad076b4e521271 ab61-init 8f2b64c3828b4f7 64424ef0c97b214 a87165cf0991fd4 657f
..
33G 25f577baf2f68a1
4.0K 25f577baf2f68a1
..
..
82G 5fe1c6812ec0903
4.0K 5fe1c6812ec0903
..
..
4.0K ecf1d4413761230
4.0K ede479c43090e99
As discussed in other emails thread we need answers for below from analytic perspective . Based on above logs it should be 120G for analytic containers. So we need more answers from analytic team. Is there any auto purge mechanism in place ??
From: Himanshu Bahukhandi
Sent: Monday, March 5, 2018 9:06 AM
To: Prashanth Nageshappa <email address hidden>; Soumit Mishra <email address hidden>; Santosh Gupta <email address hidden>; Jeba Paulaiyan <email address hidden>; Kamlesh Parmar <email address hidden>; Ritam Gangopadhyay <email address hidden>; Contrail Systems Analytics Team <email address hidden>
Cc: Rudra Rugge <email address hidden>; Abhay Joshi <email address hidden>; Viswanath KJ <email address hidden>; Srinivasan Dhamotharan <email address hidden>; Ramakantha Guthi <email address hidden>
Subject: Re: CAN production server reached 100% disk space
Hello,
We can rule out 1 and 2 for now as customer moved to 3.2.1 and doesn't see a lot of zookeeper snapshots. Also, there are no analytics core in the new CSO build that they are running. We still need to focus on the analytics table size that grows with time. There is a customer call tomorrow where they will be asking about this issue.
1. Zookeeper snapshot files in controller container (~38G) zookeeper/ version- 2/
/var/lib/
2. Contrail Collector process crash files in the analytics container (~80G) core.contrail- collec. *
/var/crashes/
3. Cassandra database table in the analyticsdb container (~80G) cassandra/ data/ContrailAn alyticsCql/ statstablebystr tagv3-3028f7908 91111e78c7b0f51 9d1da8b1
/var/lib/
Thanks,
- Himanshu B.