Delayed S3 query processing due to memcached token locks
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Invalid
|
High
|
Boris Bobrov | ||
6.0.x |
Won't Fix
|
High
|
Alexey Stupnikov | ||
6.1.x |
Invalid
|
High
|
Alexey Stupnikov | ||
7.0.x |
Invalid
|
High
|
Boris Bobrov |
Bug Description
Customer has reported an issue with queries from single user to S3 interface (exposed by Ceph/RadosGW) experiencing processing delays. Switching keystone token storage from memcache to mysql helps resolve the problem.
Steps to reproduce:
1. Deploy vanilla MOS (for customer it's 6.0 deployed by Mirantis, whether 6.1 is affected is yet unclear)
2. Specify Ceph to back all the storage needs (incl. S3 API endpoint)
3. Create a user that application will use to upload data into Ceph object storage via S3 interface (ec2-credential
4. Start uploading objects (in this case - graphical images) using credentials from step 3, with multiple requests using same credentials going in parallel
Actual result:
keystone token-get takes from 0 to 9 seconds, request processing is consequently slow and tends to queue up.
Expected result:
keystone token-get is expected to take less time and be consistent (not vary from 0 to 9 secs). For example, if same parallel upload test is executed with multiple S3 user accounts applied randomly - the processing is always <1s
Suspected rootcause:
It seem that when keystone validates the log-ins through the rados gateway it seems to take some lock on user level when handling the back-end store of the tokens. In MOS it seems to be memcached that is used for storing tokens. Changing driver=
description: | updated |
description: | updated |
tags: | added: customer-found |
Changed in fuel: | |
assignee: | nobody → MOS Maintenance (mos-maintenance) |
milestone: | none → 6.0-mu-6 |
status: | New → Confirmed |
Changed in fuel: | |
importance: | Undecided → Medium |
affects: | fuel → mos |
Changed in mos: | |
milestone: | 6.0-mu-6 → none |
no longer affects: | mos/6.0-updates |
no longer affects: | fuel |
Changed in fuel: | |
assignee: | nobody → Boris Bobrov (bbobrov) |
milestone: | none → 7.0 |
Full story from customer
"When we use the S3 interface we use it from our application which saves a lot of small images. Naturally we set up a user that our application can use to access the S3 interface with one access and corresponding secret key (keystone ec2-credentials -create) . The sending of the images (objects) is multi-threaded to overcome the natural latency of the object storage. What I have found out is that when keystone validates the log-ins through the rados gateway it seems to take some lock on user level when handling the back-end store of the tokens. In MOS it seems to be memcached that is used for storing tokens.
All keystone requests using the same user that is storing a lot of images through the S3 interface are slow when the application is sending stuff to the S3 interface. keystone token-get can take between 0 and 9 seconds when we send images to the S3 interface if you do it with the same user. If you try keystone token-get with another user (OS_USERNAME=some other user) it always takes < 1s.
I have been able to verify this in a virtual env. as well. It's very easy to see that the performance of the S3 interface goes down when issuing a lot of parallel requests to the S3 interface with the same user.
I have also tried to change the storage back end for the tokens to mysql in the virtual env. Changing driver= keystone. token.backends. memcache. Token to driver= keystone. token.persisten ce.backends. sql.Token seems to increase the performance quite a lot. Of course the load on mysql goes up significantly and you have to purge expired tokens from the keystone database, but all in all it works better. Would it be ok to change that in the real env. as well, or do you see any pitfalls with that?
The ldap queries (keystone is configured with ldap backend) doesn't seem to have anything to do with the bottleneck in the S3 interface we see right now."