_update_user_list_with_cas causes significant overhead (when using memcached as token store backend)

Bug #1251123 reported by Zhidong Yu
76
This bug affects 12 people
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Fix Released
Critical
Morgan Fainberg
Havana
Fix Released
Critical
Morgan Fainberg

Bug Description

[Problem statement]
In Havana, when using memcached as the backend of token store, we have been seeing significant performance drop by comparison with Grizzly.

[How to reproduce]
We used a Python script to boot VMs at the rate of 1 VM per second. We have seen a lot of VM creation failed and the Keystone-all process's CPU utilization was nearly 100%.

[Analysis]
When using memcached as token's backend, keystone stores two types of K-V pairs into memcached.

   token_id ===> token data (associated with an TTL)

   user_id ===> a list of ids for tokens that belong to the user

When creating a new token, Keystone first adds the (token_id, data) pair into memcahce, and then update the (user_id, token_id_list) pair in function _update_user_list_with_cas.

What _update_user_list_with_cas does are:
    1. retrieve the old list
    2. for each token_id in the old list, retrieve the token data to check whether it is expired or not.
    3. discard the expired tokens, add the valid token_ids to a new list
    4. append the newly created token's id to the new list too.
    5. use memcached's Compare-And-Set function to replace the old list with the new list

In practice we have found it is very usual that a user have thousands of valid tokens at a given moment, so the step 2 consumes a lot of time. What's worse is that CAS tends to end up with failure and retry, which makes this function even less efficient.

[Proposed fix]
I'd like to propose a 'lazy cleanup of expired token_ids from the user list' solution.

The idea is to avoid doing the clean up EVERY TIME when a new token is created. We can set a dynamic threshold T for each user, and cleanup job will be triggered only when the number of token_ids exceeds the threshold T. After every cleanup, it will check how many token_ids have been cleaned up, if the percentage is lower than a pre-specified P, than the T needs to be increased to T*(1+P) to avoid too frequent clean-ups.

Besides, every time the list_tokens function for a given user is called, it will always trigger a clean-up action. It is necessary to ensure list_tokens always return valid tokens only.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

There are a couple things to address here. The first is that list_tokens does not actually need to only return valid tokens. The only cases where list_tokens should be used is on the revoke_tokens_for_{user|project|domain|trust|etc} calls. These calls don't actually care if the tokens are valid or not (it's checked before they are added to the revocation list).

In the Grizzly timeframe, this list regularly could be filled with bogus/invalid tokens and when the token list hits a memcache page size, no further tokens for that user can be issued (token data is not returned, and therefore token isn't usable). In Grizzly this list also was simply appended (using the memcache append call) rather than loaded and inspected in any cases.

Inherently, this is a flaw in the use of memcache as a token store and the need to maintain an active list of tokens for the user.

In the Icehouse development cycle we will be moving to a new token revocation event system that will help eliminate the need for this specific tracking list, and remove the need to do the CAS work in memcache to be able to properly revoke tokens on the events that need to revoke many tokens (e.g. password change).

Would it be possible to re-use tokens (especially in the case of booting 1 VM per second)? This would help to alleviate the extra load due to memcache token management.

Revision history for this message
John Dewey (retr0h) wrote :

I am seeing a similar issue. I was unable to gdb the running keystone process and get anything useful in python land. However, strace revealed the following.

20641 recvfrom(9, "VALUE token-d46afd7b98514ac4ab14"..., 4096, 0, NULL, NULL) = 3001
20641 sendto(9, "get token-675c3bfc48db4592a0f5fa"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-675c3bfc48db4592a0f5"..., 4096, 0, NULL, NULL) = 3001
20641 sendto(9, "get token-40d9552e6d814777884c60"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-40d9552e6d814777884c"..., 4096, 0, NULL, NULL) = 3004
20641 sendto(9, "get token-4cab27149c134e4a958e9b"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-4cab27149c134e4a958e"..., 4096, 0, NULL, NULL) = 3004
20641 sendto(9, "get token-3a72f47c23c24e5ea0bc55"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-3a72f47c23c24e5ea0bc"..., 4096, 0, NULL, NULL) = 3004
20641 sendto(9, "get token-3b4695fa2c524d888594bb"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-3b4695fa2c524d888594"..., 4096, 0, NULL, NULL) = 3004
20641 sendto(9, "get token-74e9ab6f8ec54814816273"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-74e9ab6f8ec548148162"..., 4096, 0, NULL, NULL) = 3004
20641 sendto(9, "get token-a363ac508f134aeb8e7c17"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-a363ac508f134aeb8e7c"..., 4096, 0, NULL, NULL) = 3004
20641 sendto(9, "get token-31e9b5469158494d8f40d8"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-31e9b5469158494d8f40"..., 4096, 0, NULL, NULL) = 3004
...
20641 recvfrom(9, 0x3ec2a24, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
20641 epoll_ctl(4, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=9, u64=21512400263970825}}) = 0
20641 epoll_wait(4, {{EPOLLIN, {u32=9, u64=21512400263970825}}}, 1023, 2355) = 1
20641 epoll_ctl(4, EPOLL_CTL_DEL, 9, {EPOLLRDNORM|EPOLLWRBAND|0x4c6820, {u32=0, u64=21512400263970816}}) = 0
...
20641 recvfrom(9, "VALUE token-ba64562da8764d89b7bd"..., 4096, 0, NULL, NULL) = 3004
20641 sendto(9, "get token-7a55f37328d84d3ba64064"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-7a55f37328d84d3ba640"..., 4096, 0, NULL, NULL) = 3004
20641 sendto(9, "get token-bacd73e0f29d4a81af1f13"..., 44, 0, NULL, 0) = 44
20641 recvfrom(9, "VALUE token-bacd73e0f29d4a81af1f"..., 4096, 0, NULL, NULL) = 1400
20641 recvfrom(9, "adminURL\np62\nVhttps://poc.openst"..., 4096, 0, NULL, NULL) = 1604

The EAGAIN occurred on FH9 which is the memcached socket.

keystone- 20641 keystone 9u IPv4 404728036 0t0 TCP example.com.com:59342->example.com:11211 (ESTABLISHED)

Restarting memcached returned keystone to its normal usage.

I have a monitoring job which runs every 30 seconds to check the API endpoints. My guess is, I am getting a crap ton of tokens, and the cleanup, mentioned in the earlier bug reports has issues keeping up.

Revision history for this message
John Dewey (retr0h) wrote :

I set keystone's token expiration to 3 hours. Memcached should not return keys which have expired.
Verified keystone is setting the new TTL properly in memcached.

root@ds590:~# telnet 10.214.221.1 11211
Trying 10.214.221.1...
Connected to 10.214.221.1.
Escape character is '^]'.
stats cachedump 17 1
ITEM token-bc9354d63360429fb2574ba8d0264d69 [2802 b; 1384518981 s]
END
telnet> quit
Connection closed.

root@ds590:~# date
Fri Nov 15 09:37:56 UTC 2013

root@ds590:~# date -d @1384518981
Fri Nov 15 12:36:21 UTC 2013

Will see if this work-a-round is workable ;)

Revision history for this message
John Dewey (retr0h) wrote :

Lowering the expiration solved our problems for now. We no longer see the issues we were seeing at the same rate.
I would consider this a work-a-round, until keystone supports PKI w/o needing to share token state between servers.

Revision history for this message
Zhidong Yu (zhidong) wrote :

Morgan, reusing a token needs to update its expire time, right? So the first session who created the token won't get expired at its expected time. This sounds like some security issue.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

The expires time on a token wont change even if you reuse a token.

In fact, if you issue a new token based upon the original token, the expires time of the new token will match the first token.

A token is valid until it is either revoked or it expires and can be reused (depending on action needed, e.g. a non-admin token couldn't be used to perform an admin task).

John Dewey (comment #4) has confirmed that reducing the expiration time on tokens (in the [token] section of the config, set the "expiration" option to something like 3600 or 7200 (2h tokens).), if you are not relying on tokens lasting for a whole day.

Changed in keystone:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/66149

Thierry Carrez (ttx)
Changed in keystone:
importance: Medium → High
Alan Pevec (apevec)
tags: removed: havana
Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

Bumped to critical based on conversation in IRC with Dolphm.

This issue can cause keystone to become unresponsive with a moderate amount of token issuance.

The clear fix is to ensure less calls to Memcache are made when updating the user-token-list index (proposed fix for havana). The same/similar change will be proposed to master shortly to ensure consistency of the changesets across both branches.

Changed in keystone:
importance: High → Critical
Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

WRT Comment #8, there have been multiple cases reported by various deployers (via Mailing list and direct conversations on IRC) about this specific issue affecting them.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/71683

Changed in keystone:
assignee: nobody → Morgan Fainberg (mdrnstm)
status: Triaged → In Progress
Dolph Mathews (dolph)
Changed in keystone:
milestone: none → icehouse-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (master)

Reviewed: https://review.openstack.org/71683
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=6f41e3b010a0438f483cbbd0f26e00b2feec0f30
Submitter: Jenkins
Branch: master

commit 6f41e3b010a0438f483cbbd0f26e00b2feec0f30
Author: Morgan Fainberg <email address hidden>
Date: Sat Jan 11 16:58:32 2014 -0800

    Limit calls to memcache backend as user token index increases in size

    This eliminates the need to do a get on each token in the user's index
    on token issuance. This change will change the maximum number of tokens
    that can be outstanding for a given user. This change is two-fold, first
    instead of using JSON to store the token IDs, the python list structure
    is being stored in memcached; second the expiry for the token is also
    stored in the list. The net result is that fewer tokens can be stored
    in the user's token index list due to an increase in data being stored
    per token in the index page.

    The new logic will attempt to upgrade the old json-style lists to
    the new format of [(token_id, expiry), ...] stored as a native
    python object in the memcache backend. This conversion will keep
    any outstanding tokens in the list from (<time_of_conversion> +
    <configured expiration of tokens>). This is done to ensure that
    tokens can still be invalidated by operations that invalidate
    tokens based upon user/project/trust/etc changes without causing
    potential lockups in keystone trying to retrieve all of the
    actual token expiration times from memcache.

    Closes-bug: #1251123
    Change-Id: Ida39b4699ed6c568609a5121573fc3be5c4ab2f4

Changed in keystone:
status: In Progress → Fix Committed
Revision history for this message
Jacob Cherkas (jcherkas) wrote :

I tested setting the expiration to 1 hour and after restarting keystone and memcache everything works as expected.

About 30 minutes later when I try to spawn 30 concurrent instances keystone-all runs at 100% cpu and 90% of instances fail to spawn.

I switch back to using driver = keystone.token.backends.sql.Token and can spawn 30 instances concurrently multiple times without issues.

keystone-all never uses more than 20% cpu.

We have 2 keystone servers behind LB and 2 memcache servers running local to each keystone. Each keystone has a entry point to each of the memcache servers:

[memcache]
servers = 10.34.124.109:11211,10.34.124.142:11211

Do you expect the workaround to function as if the patch was applied or do you expect the patch to resolve the issue and the workaround as not something that is a valid test?

Thanks

JC

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to keystone (stable/havana)

Reviewed: https://review.openstack.org/66149
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=c22f2edeb7af8c4c8256bbadbe75b453fa09dc93
Submitter: Jenkins
Branch: stable/havana

commit c22f2edeb7af8c4c8256bbadbe75b453fa09dc93
Author: Morgan Fainberg <email address hidden>
Date: Sat Jan 11 16:58:32 2014 -0800

    Limit calls to memcache backend as user token index increases in size

    This eliminates the need to do a get on each token in the user's index
    on token issuance. This change will change the maximum number of tokens
    that can be outstanding for a given user. This change is two-fold, first
    instead of using JSON to store the token IDs, the python list structure
    is being stored in memcached; second the expiry for the token is also
    stored in the list. The net result is that fewer tokens can be stored
    in the user's token index list due to an increase in data being stored
    per token in the index page.

    The new logic will attempt to upgrade the old json-style lists to
    the new format of [(token_id, expiry), ...] stored as a native
    python object in the memcache backend. This conversion will keep
    any outstanding tokens in the list from (<time_of_conversion> +
    <configured expiration of tokens>). This is done to ensure that
    tokens can still be invalidated by operations that invalidate
    tokens based upon user/project/trust/etc changes without causing
    potential lockups in keystone trying to retrieve all of the
    actual token expiration times from memcache.

    Closes-bug: #1251123
    Change-Id: Ida39b4699ed6c568609a5121573fc3be5c4ab2f4

Thierry Carrez (ttx)
Changed in keystone:
status: Fix Committed → Fix Released
Revision history for this message
Tiantian Gao (gtt116) wrote :

I hit the bug again like Jacob said. If user continue to ask for a token, the list will grow infinite, and keystone's CPU usage will bump to 100%.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

@Tiantian Gao,

Are you running this with the fix committed into stable/havana? With this fix you shouldn't need to keep the token TTL low, however, it is absolutely recommended that the token expiration (TTL) is kept as low as possible.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

@Jacob Cherkas,

The work around will help, but it will not be as affective (nor "solve" the issue) as the fix committed to the stable tree.

Revision history for this message
Tiantian Gao (gtt116) wrote :

@Morgan Fainberg,

I do run with the patch committed into stable/havana. I set the TTL to 24 hours (86400s).

In my understand, if user create lots of token (maybe 10,000) there is chance to make keystone crazy, right? do I miss something?

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

@Tiantian Gao,

That has always been a risk with the memcache backend. Keystone does handle it a bit more gracefully now. I still recommend keeping the TTL as low as possible, 24 hours is a long time. The lower the TTL the less likely you will run into issues.

The only real issue you will run into now is exceeding the page-limit size, which will prevent new tokens from being issued until some tokens expire out (for that specific user). This is not something easy to fix because it is a hard limit in memcache.

Revision history for this message
Tiantian Gao (gtt116) wrote :

@Morgan Fainberg,

This is indeed to resolve. IMHO, ths problem here is the python loop to prune expired token eating lots of CPU. How about making a ordered list which order by token expiry, in this way we can just drop the tokens which older than one token. Rather than walk through all the token list to check each. I think this can do some help to the problem.

Revision history for this message
Morgan Fainberg (mdrnstm) wrote :

@Tiantian Gao,

Testing has shown that the new loop isn't actually a big issue (it is relatively light weight overall). The majority of the CPU issues was actually IO wait and it was burned prior to this fix on communication with Memcached. But you should keep the token TTL as low as possible to mitigate any CPU overhead that still exists and try and reuse tokens where possible.

The current issue is (really) that you could exceed a memcached page size. This is not easily fixable, and was an issue for as long as the memcache token store has been around. Your solution of using an ordered list does not solve this issue.

The goal at this point is to move away from storing tokens in any form and utilize the new revocation events to eliminate the need to maintain a token list of any form. This new way forward should be available as soon as Juno.

As a side note, the memcached backend is not a very good choice because data that should be long(er) lived is not stored over restarts of the memcache service (e.g. the revocation list).

Revision history for this message
Tiantian Gao (gtt116) wrote :

@Morgan Fainberg,

Oh, I got it now, sorry for my misunderstanding, and thanks for the so detailed explanation.

In fact, we are hesitating about using Mysql or mysql backend on the premise that we can't change default TTL because users already using it for a long time. The problem to mysql is updating user password takes too much time, the problem to memcache is reported here.

Thierry Carrez (ttx)
Changed in keystone:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.