Memory leak in some neutron agents
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla |
Invalid
|
Undecided
|
Unassigned | ||
Rocky |
Triaged
|
High
|
Unassigned | ||
neutron |
Invalid
|
High
|
Unassigned |
Bug Description
We have an OpenStack deployment using rocky release. We have seen a memory leak issue in some neutron agents twice in our environment since it was first deployed this Jan.
Below are some of the commands we ran to identify the issue and their corresponding output:
This was on one of the compute nodes:
-------
[root@c1s4 ~]# ps aux --sort -rss|head -n1
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
42435 48229 3.5 73.1 98841060 96323252 pts/13 S+ 2018 1881:25 /usr/bin/python2 /usr/bin/
-------
And this was on one of the controller nodes:
-------
[root@r1 neutron]# ps aux --sort -rss|head
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
42435 30940 3.1 48.6 68596320 64144784 pts/37 S+ Jan08 588:26 /usr/bin/python2 /usr/bin/
42435 20902 2.8 26.1 36055484 34408952 pts/35 S+ Jan08 525:12 /usr/bin/python2 /usr/bin/
42434 34199 7.1 6.0 39420516 8033480 pts/11 Sl+ 2018 3620:08 /usr/libexec/mysqld --basedir=/usr --datadir=
42435 8327 2.6 2.2 3546004 3001772 pts/10 S+ Jan17 152:04 /usr/bin/python2 /usr/bin/
42435 40171 2.6 2.1 3893480 2840852 pts/19 S+ Jan16 190:54 /usr/bin/python2 /usr/bin/
root 42430 3.1 0.3 4412216 495492 pts/29 SLl+ Jan16 231:20 /usr/sbin/
-------
When it happened, we saw a lot of 'OSError: [Errno 12] Cannot allocate memory' ERRORs in different neutron-* logs, because there were no free mem left. However, we don't know yet what had triggered the memory leakage.
Here is our globals.yml:
-------
[root@r1 kolla]# cat globals.yml |grep -v "^#"|tr -s "\n"
---
openstack_release: "rocky"
kolla_internal_
enable_barbican: "yes"
enable_ceph: "yes"
enable_ceph_mds: "yes"
enable_ceph_rgw: "yes"
enable_cinder: "yes"
enable_
enable_
enable_
enable_
ceph_pool_pg_num: 16
ceph_pool_pgp_num: 16
ceph_osd_
glance_
glance_
glance_
ironic_
tempest_image_id:
tempest_
tempest_
tempest_
-------
I did some search on google and found this ovs bug is highly related https:/
I am not sure if the fix has been included in the latest Rocky kolla images?
Best regards,
Lei
Changed in kolla: | |
status: | New → Confirmed |
Changed in kolla: | |
status: | Confirmed → Invalid |
The linked RH bugzilla bug suggests that the OVS fix is included in v2.11.0. The kolla image just installs these packages:
RPM: openvswitch, python-openvswitch
DEB: openvswitch-switch, python-openvswitch
So it really depends on what is included in the distro packages. On the master branch of kolla, in the CentOS image I see
openvswitch- 2.11.0- 4.el7.x86_ 64
which comes from the delorean- master- testing yum repo.
The kolla rocky branch in the CentOS image I see
openvswitch- 2.10.1- 3.el7.x86_ 64
which comes from the centos- openstack- rocky Yum repo.