keystonemiddleware connections to memcached from neutron-server grow beyond configured values

Bug #1883659 reported by Justinas Balciunas
284
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
keystonemiddleware
Confirmed
Undecided
Unassigned
oslo.cache
Invalid
Undecided
Unassigned

Bug Description

Using: keystone-17.0.0, Ussuri

I've noticed a very odd behaviour of keystone_authtoken with memcached and neutron-server. The connection count to memcached grows over time, ignoring the settings of memcache_pool_maxsize and memcache_pool_unused_timeout. The keystone_authtoken middleware configuration is as follows:

[keystone_authtoken]
www_authenticate_uri = http://keystone_vip:5000
auth_url = http://keystone_vip:35357
auth_type = password
project_domain_id = default
user_domain_id = default
project_name = service
username = neutron
password = neutron_password_here
cafile =
memcache_security_strategy = ENCRYPT
memcache_secret_key = secret_key_here
memcached_servers = memcached_server_1:11211,memcached_server_2:11211,memcached_server_3:11211
memcache_pool_maxsize = 100
memcache_pool_unused_timeout = 600
token_cache_time = 3600

Commenting out memcached settings under [keystone_authtoken] and restarting neutron-server drops the connection count in memcached to normal levels, i.e. hundreds, rather than thousands when neutron-server is using memcached. Neutron team (slaweq) suggested this is a Keystone issue because quote: "Neutron is just using keystonemiddleware as one of the middlewares in the pipeline".

Grafana memcached connection graphs: https://ibb.co/p3TCJqC AND https://ibb.co/nmmvvH4

The drops in the graphs indicate the restart of the neutron-server, so not sure if this is something to be expected, or there is an issue with the configuration, or it's a bug?

summary: - keystonemiddleware connections to memcached from neutron-server grows
+ keystonemiddleware connections to memcached from neutron-server grow
beyond configured values
Revision history for this message
Gage Hugo (gagehugo) wrote :

Added keystonemiddleware

Revision history for this message
Gage Hugo (gagehugo) wrote :

Added oslo.cache, not 100% sure which is affected yet.

no longer affects: keystone
Revision history for this message
Justinas Balciunas (justinas-balciunas) wrote :

Few additions:
1) the situation is not noticeable immediately, therefore automated tests don't trigger this as the whole setup (three memcached nodes, three neutron-servers with keystone_authtoken configured to use memcached) needs to run for a while to see that memcached connection count has exceeded the defined limits;
2) it was also observed that only two memcached nodes out of three are being hit by the uncontrollable growth in the number connections, i.e. one memcached node takes the most load, the second trails by 30-40% less and the third serves usual connection count;
3) the open connection count rises until the limits in memcached configuration are reached (25k per memcached node in my case);

Revision history for this message
Pierre Riteau (priteau) wrote :

I can confirm that I am seeing this issue with neutron-server, using three memcached servers through keystonemiddleware. This is with the Train release deployed on CentOS 8 with Kolla, which uses the following RDO packages:

openstack-neutron-15.1.0-1.el8.noarch
python3-keystonemiddleware-7.0.1-2.el8.noarch
python3-oslo-cache-1.37.0-2.el8.noarch

Revision history for this message
Pierre Riteau (priteau) wrote :

I am able to make the problem go away with this extra setting in neutron.conf:

[keystone_authtoken]
memcache_use_advanced_pool = True

This is the documentation for this setting:

# (Optional) Use the advanced (eventlet safe) memcached client pool. The
# advanced pool will only work under python 2.x. (boolean value)

This description dates from 2016. For now I haven't seen any issue enabling this setting with Python 3.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

It affects keystonemiddleware but I guess the fix is needed in oslo.cache

Changed in keystonemiddleware:
status: New → Confirmed
Changed in oslo.cache:
status: New → Confirmed
Revision history for this message
Herve Beraud (herveberaud) wrote :

Hello,

If I correctly understood this top you say that the connections grow more than allowed by the given config, right?

Few weeks ago another bug was opened [1] and it was due to `flush_on_reconnect` that can cause exponential raising of connections to memcached servers.

IIRC this option was mostly introduced for keystone's.

The submitted patch [1] is moving flush_on_reconnect from code to oslo.cache config block to be configurable.

It could be worth to follow a bit this track, and maybe try to turn off flush_on_reconnect manually and then observe the behavior with your context.

So either you can edit the code to remove this option, or you may try to apply this patch [1] to disable it by using config.

Please let me know if it help you.

[1] https://review.opendev.org/#/c/742193/

Revision history for this message
Herve Beraud (herveberaud) wrote :

s/top/topic/

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

There is something linking this issue to https://bugs.launchpad.net/neutron/+bug/1864418 (neutron unable to run behind mod_wsgi). I sense a threading issue. Could be just me. :-)

Revision history for this message
Jeremy Stanley (fungi) wrote :

It looks like this may be the same as public security bug 1892852 and bug 1888394.

Changed in ossa:
status: New → Incomplete
information type: Public → Public Security
Revision history for this message
Michal Arbet (michalarbet) wrote :

I think this is caused by usage of obsolete code which was used in keystonemiddleware long time ago - before oslo.cache support was added to keystonemiddleware as new library (in past time :)).

Services which are using keystonemiddleware should use memcache_use_advanced_pool = True (oslo.cache memcached_pool implementation) instead of obsolete code.

Or better said option memcache_use_advanced_pool should be removed and keystonemiddleware should use oslo.cache implementation by default.

Oslo.cache introducing and adding to requirements in :
 https://review.opendev.org/c/openstack/keystonemiddleware/+/268664
 https://review.opendev.org/c/openstack/keystonemiddleware/+/527466/

Revision history for this message
Jeremy Stanley (fungi) wrote :

At this point there's no clear exploit scenario and the description of this and the other two presumed related reports seems to be of a normal (albeit potentially crippling) bug. As such, the vulnerability management team is going to treat this as a class D report per our taxonomy and not issue an advisory once it's fixed, but if anyone disagrees we can reconsider the position: https://security.openstack.org/vmt-process.html#incident-report-taxonomy

Changed in ossa:
status: Incomplete → Won't Fix
Revision history for this message
Ben Nemec (bnemec) wrote :

I'm closing this for oslo.cache since according to comment 5 the problem goes away when using the oslo.cache backend.

Changed in oslo.cache:
status: Confirmed → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/keystonemiddleware 9.3.0

This issue was fixed in the openstack/keystonemiddleware 9.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystonemiddleware (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/keystonemiddleware/+/793917

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.