kolla

Memory leak in some neutron agents

Series rocky
Bug #1823818

Bug #1823818 reported by Lei Zhang on 2019-04-09

This bug affects 6 people

	Status	Importance	Assigned to
kolla	Invalid	Undecided	Unassigned
Rocky	Triaged	High	Unassigned
neutron	Invalid	High	Unassigned

Bug Description

We have an OpenStack deployment using rocky release. We have seen a memory leak issue in some neutron agents twice in our environment since it was first deployed this Jan.

Below are some of the commands we ran to identify the issue and their corresponding output:

This was on one of the compute nodes:
-----------------------------------------------
[root@c1s4 ~]# ps aux --sort -rss|head -n1

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

42435 48229 3.5 73.1 98841060 96323252 pts/13 S+ 2018 1881:25 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini
-----------------------------------------------

And this was on one of the controller nodes:
-----------------------------------------------
[root@r1 neutron]# ps aux --sort -rss|head

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

42435 30940 3.1 48.6 68596320 64144784 pts/37 S+ Jan08 588:26 /usr/bin/python2 /usr/bin/neutron-lbaasv2-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/lbaas_agent.ini --config-file /etc/neutron/neutron_lbaas.conf

42435 20902 2.8 26.1 36055484 34408952 pts/35 S+ Jan08 525:12 /usr/bin/python2 /usr/bin/neutron-dhcp-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/dhcp_agent.ini

42434 34199 7.1 6.0 39420516 8033480 pts/11 Sl+ 2018 3620:08 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/usr/lib64/mysql/plugin --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_on=ON --log-error=/var/log/kolla/mariadb/mariadb.log --pid-file=/var/lib/mysql/mariadb.pid --port=3306 --wsrep_start_position=0809f452-0251-11e9-8e60-6ad108d9be7b:0

42435 8327 2.6 2.2 3546004 3001772 pts/10 S+ Jan17 152:04 /usr/bin/python2 /usr/bin/neutron-l3-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/neutron_vpnaas.conf --config-file /etc/neutron/l3_agent.ini --config-file /etc/neutron/fwaas_driver.ini

42435 40171 2.6 2.1 3893480 2840852 pts/19 S+ Jan16 190:54 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini

root 42430 3.1 0.3 4412216 495492 pts/29 SLl+ Jan16 231:20 /usr/sbin/ovs-vswitchd unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --log-file=/var/log/kolla/openvswitch/ovs-vswitchd.log
---------------------------------------------

When it happened, we saw a lot of 'OSError: [Errno 12] Cannot allocate memory' ERRORs in different neutron-* logs, because there were no free mem left. However, we don't know yet what had triggered the memory leakage.

Here is our globals.yml:
---------------------------------------------
[root@r1 kolla]# cat globals.yml |grep -v "^#"|tr -s "\n"
---
openstack_release: "rocky"
kolla_internal_vip_address: "172.21.69.22"
enable_barbican: "yes"
enable_ceph: "yes"
enable_ceph_mds: "yes"
enable_ceph_rgw: "yes"
enable_cinder: "yes"
enable_neutron_lbaas: "yes"
enable_neutron_fwaas: "yes"
enable_neutron_agent_ha: "yes"
enable_ceph_rgw_keystone: "yes"
ceph_pool_pg_num: 16
ceph_pool_pgp_num: 16
ceph_osd_store_type: "xfs"
glance_backend_ceph: "yes"
glance_backend_file: "no"
glance_enable_rolling_upgrade: "no"
ironic_dnsmasq_dhcp_range:
tempest_image_id:
tempest_flavor_ref_id:
tempest_public_network_id:
tempest_floating_network_name:
-----------------------------------------------

I did some search on google and found this ovs bug is highly related https://bugzilla.redhat.com/show_bug.cgi?id=1667007

I am not sure if the fix has been included in the latest Rocky kolla images?

Best regards,

Lei

Tags:

Tom Fifield (fifieldt) on 2019-04-09

Changed in kolla:
status:	New → Confirmed

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-04-09:

The linked RH bugzilla bug suggests that the OVS fix is included in v2.11.0. The kolla image just installs these packages:

RPM: openvswitch, python-openvswitch
DEB: openvswitch-switch, python-openvswitch

So it really depends on what is included in the distro packages. On the master branch of kolla, in the CentOS image I see

openvswitch-2.11.0-4.el7.x86_64

which comes from the delorean-master-testing yum repo.

The kolla rocky branch in the CentOS image I see

openvswitch-2.10.1-3.el7.x86_64

which comes from the centos-openstack-rocky Yum repo.

Revision history for this message

Slawek Kaplonski (slaweq) wrote on 2019-04-09:

It looks like this is really same issue as in https://bugzilla.redhat.com/show_bug.cgi?id=1667007 so it's not direclty issue in neutron but in openvswitch.
I will then mark it as invalid for neutron but feel free to change it if that would be different issue.

tags:	added: ovs
Changed in neutron:
status:	New → Invalid
importance:	Undecided → High

Revision history for this message

Bernard Cafarelli (bcafarel) wrote on 2019-04-09:

And in the meantime, for the kolla side I filled https://bugzilla.redhat.com/show_bug.cgi?id=1697925 to track it for Rocky RDO (other branches are not impacted)

Revision history for this message

Lei Zhang (zhangleiop) wrote on 2019-04-11:

Great thanks!

Mark Goddard (mgoddard) on 2020-06-01

Changed in kolla:
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

redhat-bugs #1667007
[CLOSED ERRATA] Edit
redhat-bugs #1697925
[CLOSED CURRENTRELEASE] Edit

Bug watches keep track of this bug in other bug trackers.