Comment 7 for bug 1749900

Revision history for this message
Ramkantha R Guthi (rkrguthi) wrote :

Hi Sundar,

I have attached other logs collected from customer CAN nodes. These logs were uploaded to CSO JIRA bug and are missing in launchpad.

As per logs it seems more space consumed by active containers compare to old images . So we can rule out issues mentioned in github forums about stale or old images.

root@anc1-prd1-csp-can-01:/var/lib/docker# cd aufs/
root@anc1-prd1-csp-can-01:/var/lib/docker/aufs# ls
diff layers mnt

root@anc1-prd1-csp-can-01:/var/lib/docker/aufs/diff# du -sh *
12K 00a0206078b9c8a520e4d2ebd82ec1033aa4c91001f87fe9b165a46aeee71f83
20K 00cf9f653764931f31c55cc3f0a041721fe646e1a0f56c9fb1c696309d4b8709

..
..
24K 12b85e14145df7328c7bc706fc3cfb1b506830de770fca2a4d9a7feb09ba104c
20K 12b85e14145df7328c7bc706fc3cfb1b506830de770fca2a4d9a7feb09ba104c-init
...

1.3G 56fde5f11d9656d27c25d3f1f2d15da30267240f4488e95e113bda26d71ec17f
24K 56fde5f11d9656d27c25d3f1f2d15da30267240f4488e95e113bda26d71ec17f-init
..
37G 6e31524d1bedfc2f27bfdecd91c2ca0bbcc0e25eda5d41e3acf75daba8b9257c
24K 6e31524d1bedfc2f27bfdecd91c2ca0bbcc0e25eda5d41e3acf75daba8b9257c-init
..
86G bdeeb2089d2e519d9c4f5fe9392fd4e9c3db83be41b481336e68e0b08e891531
24K bdeeb2089d2e519d9c4f5fe9392fd4e9c3db83be41b481336e68e0b08e891531-init

CAN-2 node :

root@anc1-prd1-csp-can-02:/var/lib/docker/aufs# du -sh *
119G diff
224K layers
119G mnt

root@anc1-prd1-csp-can-02:/var/lib/docker/aufs/mnt# du -sh *
4.0K 0560635be9b776566523735df40f8b1dffe5afc62485f9b526334c6c8e9fca4c
4.0K 074953fd509f9dd6b51825278c37e482e75cff7fd71ce3f4cc1eece020cea018

..
..
33G 25f577baf2f68a1d21a2292057930699c0aef7e318edaea9fbfbe261fe80fe45
4.0K 25f577baf2f68a1d21a2292057930699c0aef7e318edaea9fbfbe261fe80fe45-init
..
..
82G 5fe1c6812ec09036e909831a7960d2cf4f197969d645e6e84de7f8507c56e772
4.0K 5fe1c6812ec09036e909831a7960d2cf4f197969d645e6e84de7f8507c56e772-init
..
..
4.0K ecf1d4413761230bc46c24d4a2bc07e5f6784191cd7a57ad076b4e521271ab61-init
4.0K ede479c43090e998f2b64c3828b4f764424ef0c97b214a87165cf0991fd4657f

As discussed in other emails thread we need answers for below from analytic perspective . Based on above logs it should be 120G for analytic containers. So we need more answers from analytic team. Is there any auto purge mechanism in place ??

From: Himanshu Bahukhandi
Sent: Monday, March 5, 2018 9:06 AM
To: Prashanth Nageshappa <email address hidden>; Soumit Mishra <email address hidden>; Santosh Gupta <email address hidden>; Jeba Paulaiyan <email address hidden>; Kamlesh Parmar <email address hidden>; Ritam Gangopadhyay <email address hidden>; Contrail Systems Analytics Team <email address hidden>
Cc: Rudra Rugge <email address hidden>; Abhay Joshi <email address hidden>; Viswanath KJ <email address hidden>; Srinivasan Dhamotharan <email address hidden>; Ramakantha Guthi <email address hidden>
Subject: Re: CAN production server reached 100% disk space

Hello,
We can rule out 1 and 2 for now as customer moved to 3.2.1 and doesn't see a lot of zookeeper snapshots. Also, there are no analytics core in the new CSO build that they are running. We still need to focus on the analytics table size that grows with time. There is a customer call tomorrow where they will be asking about this issue.

1. Zookeeper snapshot files in controller container (~38G)
/var/lib/zookeeper/version-2/

2. Contrail Collector process crash files in the analytics container (~80G)
 /var/crashes/core.contrail-collec.*

3. Cassandra database table in the analyticsdb container (~80G)
/var/lib/cassandra/data/ContrailAnalyticsCql/statstablebystrtagv3-3028f790891111e78c7b0f519d1da8b1

Thanks,
- Himanshu B.