Memory leak in contrail-collector after kafka restart
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Juniper Openstack | Status tracked in Trunk | |||||
R4.0 |
Fix Committed
|
Medium
|
Zhiqiang Cui | |||
R4.1 |
Fix Committed
|
Medium
|
Zhiqiang Cui | |||
R5.0 |
Fix Committed
|
Medium
|
Zhiqiang Cui | |||
Trunk |
Fix Committed
|
Medium
|
Zhiqiang Cui |
Bug Description
I have come across memory leak in contrail-collector referring to kaka_processor.cc (https:/
This is observed only after the restart of kafka in anayticsdb. I have attached report generated by valgrind memcheck tool stating the below two leaks. And at the end of this, are the steps which I used to reproduce the issue in two setups running CAN version 4.1.1.0-10.
From valgrind_
First leak ,
==10081==
==10081== 142 (72 direct, 70 indirect) bytes in 1 blocks are definitely lost in loss record 8,005 of 9,287
==10081== at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/
==10081== by 0x549589: RdKafka:
==10081== by 0x7EFD34: KafkaProcessor:
==10081== by 0x8003C0: boost::
==10081== by 0x7FFD7E: bool boost::
==10081== by 0x7FF46A: boost::
==10081== by 0x7FE59A: boost::
==10081== by 0x450387: boost::
==10081== by 0x491C84: Timer::
==10081== by 0x46841B: TaskImpl::execute() (task.cc:277)
==10081== by 0x6F95B39: ??? (in /usr/lib/
==10081== by 0x6F91815: ??? (in /usr/lib/
Second leak,
==10081==
==10081== 5,184 bytes in 39 blocks are definitely lost in loss record 9,076 of 9,287
==10081== at 0x4C2AB80: malloc (in /usr/lib/
==10081== by 0x4C2CF1F: realloc (in /usr/lib/
==10081== by 0x53FF61: rd_list_grow (in /usr/bin/
==10081== by 0x53FFE6: rd_list_init (in /usr/bin/
==10081== by 0x545189: rd_kafka_
==10081== by 0x503467: rd_kafka_timers_run (in /usr/bin/
==10081== by 0x4E45A6: rd_kafka_
==10081== by 0x5405E6: _thrd_wrapper_
==10081== by 0x6D5F183: start_thread (pthread_
==10081== by 0x848003C: clone (clone.S:111)
==10081==
I could remove the first leak by deleting metadata pointer object at the end. I couldn’t understand much for the second leak. (though it involves rdkafka timer too)
diff --git a/src/analytics
index 6b23d28..e760344 100644
--- a/src/analytics
+++ b/src/analytics
@@ -406,10 +406,14 @@ KafkaProcessor:
if (err != RdKafka:
} else {
if (collector_ && redis_up_) {
}
}
+ LOG(DEBUG, "Deleting metadata !!!");
+ delete metadata;
}
}
Steps used to reproduce the issue,
1. root@csp-
2. root@csp-
3. check for contrail-status
== Contrail Analytics ==
contrail-
contrail-
contrail-collector inactive // this remains inactive as I have started contrail-collector using valgrind and not as a service.
contrail-
contrail-
contrail-topology active
4. root@csp-
kafka: stopped
root@csp-
kafka: started
wait for some time and stop the valgrind process, the report will be generated
Problem is librdkafka, need upgrade librdkafka