Delayed S3 query processing due to memcached token locks

Bug #1489797 reported by Dmitriy Novakovskiy
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Boris Bobrov
6.0.x
Won't Fix
High
Alexey Stupnikov
6.1.x
Invalid
High
Alexey Stupnikov
7.0.x
Invalid
High
Boris Bobrov

Bug Description

Customer has reported an issue with queries from single user to S3 interface (exposed by Ceph/RadosGW) experiencing processing delays. Switching keystone token storage from memcache to mysql helps resolve the problem.

Steps to reproduce:
1. Deploy vanilla MOS (for customer it's 6.0 deployed by Mirantis, whether 6.1 is affected is yet unclear)
2. Specify Ceph to back all the storage needs (incl. S3 API endpoint)
3. Create a user that application will use to upload data into Ceph object storage via S3 interface (ec2-credentials-create)
4. Start uploading objects (in this case - graphical images) using credentials from step 3, with multiple requests using same credentials going in parallel

Actual result:

keystone token-get takes from 0 to 9 seconds, request processing is consequently slow and tends to queue up.

Expected result:

keystone token-get is expected to take less time and be consistent (not vary from 0 to 9 secs). For example, if same parallel upload test is executed with multiple S3 user accounts applied randomly - the processing is always <1s

Suspected rootcause:

It seem that when keystone validates the log-ins through the rados gateway it seems to take some lock on user level when handling the back-end store of the tokens. In MOS it seems to be memcached that is used for storing tokens. Changing driver=keystone.token.backends.memcache.Token to driver=keystone.token.persistence.backends.sql.Token seems to increase the performance quite a lot (validated on the same cloud).

description: updated
description: updated
Revision history for this message
Dmitriy Novakovskiy (dnovakovskiy) wrote :

Full story from customer

"When we use the S3 interface we use it from our application which saves a lot of small images. Naturally we set up a user that our application can use to access the S3 interface with one access and corresponding secret key (keystone ec2-credentials-create). The sending of the images (objects) is multi-threaded to overcome the natural latency of the object storage. What I have found out is that when keystone validates the log-ins through the rados gateway it seems to take some lock on user level when handling the back-end store of the tokens. In MOS it seems to be memcached that is used for storing tokens.

All keystone requests using the same user that is storing a lot of images through the S3 interface are slow when the application is sending stuff to the S3 interface. keystone token-get can take between 0 and 9 seconds when we send images to the S3 interface if you do it with the same user. If you try keystone token-get with another user (OS_USERNAME=some other user) it always takes < 1s.

I have been able to verify this in a virtual env. as well. It's very easy to see that the performance of the S3 interface goes down when issuing a lot of parallel requests to the S3 interface with the same user.

I have also tried to change the storage back end for the tokens to mysql in the virtual env. Changing driver=keystone.token.backends.memcache.Token to driver=keystone.token.persistence.backends.sql.Token seems to increase the performance quite a lot. Of course the load on mysql goes up significantly and you have to purge expired tokens from the keystone database, but all in all it works better. Would it be ok to change that in the real env. as well, or do you see any pitfalls with that?

The ldap queries (keystone is configured with ldap backend) doesn't seem to have anything to do with the bottleneck in the S3 interface we see right now."

description: updated
description: updated
tags: added: customer-found
Changed in fuel:
assignee: nobody → MOS Maintenance (mos-maintenance)
milestone: none → 6.0-mu-6
status: New → Confirmed
Changed in fuel:
importance: Undecided → Medium
Boris Bobrov (bbobrov)
affects: fuel → mos
Changed in mos:
milestone: 6.0-mu-6 → none
no longer affects: mos/6.0-updates
Revision history for this message
Boris Bobrov (bbobrov) wrote :

To get this fixed, I need to have [token]revoke_by_id = false. This needs to be done in fuel.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/keystone (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Boris Bobrov <email address hidden>
Review: https://review.fuel-infra.org/10967

Revision history for this message
Boris Bobrov (bbobrov) wrote :

> multiple requests using same credentials going in parallel

How many requests happen per second in your case?

no longer affects: fuel
Revision history for this message
Boris Bobrov (bbobrov) wrote :

The bug also affects fuel because in order to fix it we need to make changes to keystone config. Please see comment #2.

Changed in fuel:
assignee: nobody → Boris Bobrov (bbobrov)
milestone: none → 7.0
Revision history for this message
Boris Bobrov (bbobrov) wrote :

Nope, I need someone from fuel folks to do it for me

Changed in fuel:
assignee: Boris Bobrov (bbobrov) → Fuel Library Team (fuel-library)
Revision history for this message
Boris Bobrov (bbobrov) wrote :

I am marking the bugreport as invalid for 7.0 because it's not clear whether the bug can be reproduced in 7.0, because the locks that the reporter speaks about were heavily changed in 7.0. The original report also mentions only 6.0 or 6.1.

no longer affects: fuel
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Reassigning to keystone team for 6.0/6.1 as maintenance team need a fix in some to start backporting.

Revision history for this message
Alexander Makarov (amakarov) wrote :

Please follow the suggestion mentioned in #2
Not sure what team does this particular puppet, but it should set [token]revoke_by_id=False in /etc/keystone/keystone.conf upon initialization.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Setting bug's status to "Won't fix" for MOS 6.0, since at this point we only fix security issues there and this bug is not a security issue.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Plus we don't update puppet manifests in 6.0 and it takes puppet manifests to be updated to fix this issue.

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Retargeted to 6.1-updates as we need more time for investigation.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I have set this bug's status to invalid for MOS 6.1 because I have tested S3 API performance of MOS 6.1 using a code from attachment. The results http://pastebin.com/9cnN6KbP are pretty consistent and doesn't differ much from simple file-by-file upload.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/keystone (openstack-ci/fuel-7.0/2015.1.0)

Change abandoned by Boris Bobrov <email address hidden> on branch: openstack-ci/fuel-7.0/2015.1.0
Review: https://review.fuel-infra.org/10967

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.