Mirantis OpenStack

keystone behavior when one memcache backend is down

Bug #1332058 reported by Sergii Golovatiuk on 2014-06-19

This bug affects 4 people

	Status	Importance	Assigned to	Milestone
Mirantis OpenStack	Fix Committed	Critical	Yuriy Taraday	Mirantis OpenStack 5.1
OpenStack Identity (keystone)	Fix Released	Medium	Yuriy Taraday	OpenStack Identity (keystone) 2014.2 "juno"
keystonemiddleware	Fix Released	Medium	Morgan Fainberg	keystonemiddleware 1.2.0

Bug Description

Hi,

Our implementation uses dogpile.cache.memcached as a backend for tokens. Recently, I have found interesting behavior when one of memcache regions went down. There is a 3-6 second delay when I try to get a token. If I have 2 backends then I have 6-12 seconds delay. It's very easy to test

Test connection using

for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep real | awk '{print $2}'; done

Block one memcache backend using

iptables -I INPUT -p tcp --dport 11211 -j DROP (Simulation power outage of node)

Test the speed using

for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep real | awk '{print $2}'; done

Also I straced keystone process with

strace -tt -s 512 -o /root/log1 -f -p PID

and got

26872 connect(9, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("10.108.2.3")}, 16) = -1 EINPROGRESS (Operation now in progress)

though this IP is down

Also I checked the code

https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L210-L237
https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L285-L289
https://github.com/openstack/keystone/blob/master/keystone/common/kvs/backends/memcached.py#L96

and was not able to find any piece of details how keystone treats with backend when it's down

There should be a logic which temporarily blocks backend when it's not accessible. After timeout period, backend should be probed (but not blocking get/set operations of current backends) and if connection is successful it should be added back to operation. Here is a sample how it could be implemented

http://dogpilecache.readthedocs.org/en/latest/usage.html#changing-backend-behavior

Tags:

Revision history for this message

Morgan Fainberg (mdrnstm) wrote on 2014-06-25:

This behavior is the way the python memcache clients themselves work. This isn't specific to dogpile, keystone, or anything else.

The basic behavior is 'try and wait for a timeout'. Not sure what the best solution to this will be in the short-term. In the long term, the real solution will be non-persistent (no need to store) the tokens, which would eliminate the need for memcache in this regard.

Revision history for this message

Meg McRoberts (dreidellhasa) wrote on 2014-07-11:

Documented as "Known Issue" in 5.0.1 Release Notes

Revision history for this message

Dolph Mathews (dolph) wrote on 2014-07-15:

5.0.1 of what?

Revision history for this message

Dolph Mathews (dolph) wrote on 2014-07-15:

Based on Morgan's comment, and the fact that this is a "known issue" somewhere, it doesn't sound like there's anything for us to do in Keystone?

Changed in keystone:
status:	New → Incomplete

Revision history for this message

Sergii Golovatiuk (sgolovatiuk) wrote on 2014-08-05:

Currently, keystone is not ready for High availability and doesn't give any control on HA options for memcached.
I think there should be some specific values in keystone.conf where operators can tune libmemcached or pylibmc better than default settings.

[cache]
backend_behaviors=

This option should specify the behavior for backend (http://sendapatch.se/projects/pylibmc/behaviors.html#failover)

    regions.behaviors = {
        "tcp_nodelay": False,
        "ketama": True,
        "failure_limit": 2,
        "_retry_timeout": 30,
        "_auto_eject_hosts": True
    }

and toss all these settings during dogpile backend registration as specified at https://pypi.python.org/pypi/dogpile.cache

Changed in keystone:
status:	Incomplete → Confirmed

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-08-05:

Related bug in MOS https://bugs.launchpad.net/fuel/+bug/1340657

Changed in mos:
milestone:	none → 5.1
importance:	Undecided → High
assignee:	nobody → MOS Keystone (mos-keystone)
status:	New → Confirmed

Vladimir Kuklin (vkuklin) on 2014-08-05

tags:

added: ha

Revision history for this message

Alexei Kornienko (alexei-kornienko) wrote on 2014-08-06:

I need to know type of memcached connector that is used by keystone. best option is pylibmc

Revision history for this message

Sergii Golovatiuk (sgolovatiuk) wrote on 2014-08-06: Re: [Bug 1332058] Re: keystone behavior when one memcache backend is down

currently, Fuel uses python-memcached. It lacks of HA features. pylibmc is
supported by dogpile, though pylibmc behaviors are not tunable by keystone.
*make_region*().configure() should be invoked where options are specified
for HA

--
Best regards,
Sergii Golovatiuk,
Skype #golserge
IRC #holser

On Wed, Aug 6, 2014 at 4:03 PM, Alexei Kornienko <email address hidden>
wrote:

> I need to know type of memcached connector that is used by keystone.
> best option is pylibmc
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1332058
>
> Title:
> keystone behavior when one memcache backend is down
>
> Status in OpenStack Identity (Keystone):
> Confirmed
> Status in Mirantis OpenStack:
> Confirmed
>
> Bug description:
> Hi,
>
> Our implementation uses dogpile.cache.memcached as a backend for
> tokens. Recently, I have found interesting behavior when one of
> memcache regions went down. There is a 3-6 second delay when I try to
> get a token. If I have 2 backends then I have 6-12 seconds delay. It's
> very easy to test
>
> Test connection using
>
> for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
> real | awk '{print $2}'; done
>
> Block one memcache backend using
>
> iptables -I INPUT -p tcp --dport 11211 -j DROP (Simulation power
> outage of node)
>
> Test the speed using
>
> for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
> real | awk '{print $2}'; done
>
> Also I straced keystone process with
>
> strace -tt -s 512 -o /root/log1 -f -p PID
>
> and got
>
> 26872 connect(9, {sa_family=AF_INET, sin_port=htons(11211),
> sin_addr=inet_addr("10.108.2.3")}, 16) = -1 EINPROGRESS (Operation now
> in progress)
>
> though this IP is down
>
> Also I checked the code
>
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L210-L237
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L285-L289
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/backends/memcached.py#L96
>
> and was not able to find any piece of details how keystone treats with
> backend when it's down
>
> There should be a logic which temporarily blocks backend when it's not
> accessible. After timeout period, backend should be probed (but not
> blocking get/set operations of current backends) and if connection is
> successful it should be added back to operation. Here is a sample how
> it could be implemented
>
> http://dogpilecache.readthedocs.org/en/latest/usage.html#changing-
> backend-behavior
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/keystone/+bug/1332058/+subscriptions
>

--
Best regards,
Sergii Golovatiuk,
Skype #golserge
IRC #holser

On Wed, Aug 6, 2014 at 4:03 PM, Alexei Kornienko <akornienko@mirantis.com>
wrote:

> I need to know type of memcached connector that is used by keystone.
> best option is pylibmc
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1332058
>
> Title:
>   keystone behavior when one memcache backend is down
>
> Status in OpenStack Identity (Keystone):
>   Confirmed
> Status in Mirantis OpenStack:
>   Confirmed
>
> Bug description:
>   Hi,
>
>   Our implementation uses dogpile.cache.memcached as a backend for
>   tokens. Recently, I have found interesting behavior when one of
>   memcache regions went down. There is a 3-6 second delay when I try to
>   get a token. If I have 2 backends then I have 6-12 seconds delay. It's
>   very easy to test
>
>   Test connection using
>
>   for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
>   real | awk '{print $2}'; done
>
>   Block one memcache backend using
>
>   iptables -I INPUT -p tcp --dport 11211 -j DROP  (Simulation power
>   outage of node)
>
>   Test the speed using
>
>   for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep
>   real | awk '{print $2}'; done
>
>   Also I straced keystone process with
>
>   strace -tt -s 512 -o /root/log1 -f -p PID
>
>   and got
>
>   26872 connect(9, {sa_family=AF_INET, sin_port=htons(11211),
>   sin_addr=inet_addr("10.108.2.3")}, 16) = -1 EINPROGRESS (Operation now
>   in progress)
>
>   though this IP is down
>
>   Also I checked the code
>
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L210-L237
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/core.py#L285-L289
>
> https://github.com/openstack/keystone/blob/master/keystone/common/kvs/backends/memcached.py#L96
>
>   and was not able to find any piece of details how keystone treats with
>   backend when it's down
>
>   There should be a logic which temporarily blocks backend when it's not
>   accessible. After timeout period, backend should be probed (but not
>   blocking get/set operations of current backends) and if connection is
>   successful it should be added back to operation. Here is a sample how
>   it could be implemented
>
>   http://dogpilecache.readthedocs.org/en/latest/usage.html#changing-
>   backend-behavior
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/keystone/+bug/1332058/+subscriptions
>

Dmitry Mescheryakov (dmitrymex) on 2014-08-07

Changed in mos:
assignee:	MOS Keystone (mos-keystone) → Yuriy Taraday (yorik-sar)

Yuriy Taraday (yorik-sar) on 2014-08-07

Changed in mos:
assignee:	Yuriy Taraday (yorik-sar) → Alexei Kornienko (alexei-kornienko)

Sergii Golovatiuk (sgolovatiuk) on 2014-08-20

Changed in mos:
importance:	High → Critical

Revision history for this message

Dolph Mathews (dolph) wrote on 2014-08-20:

Based on comment #5 it sounds like the keystone side of this is just looking for keystone.conf [cache] backend_argument - which obviously already exists.

Changed in keystone:
status:	Confirmed → Invalid

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-08-21:

#10

@Dolph, then I try to use

backend=dogpile.cache.pylibmc
and
backend_argument=behaviors:tcp_nodelay:False

I recieve an error from keystone:
ERROR: __init__() got an unexpected keyword argument 'behaviors' (HTTP 400)

Changed in keystone:
status:	Invalid → New

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-08-21:

#11

Well, it could be because of old version I use (http://sendapatch.se/projects/pylibmc/index.html)...

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-08-21:

#12

As far as I can see Fuel puppet manifests (keystone_config) should be adjusted as well

Changed in keystone:
status:	New → Invalid
Changed in fuel:
assignee:	nobody → Bogdan Dobrelya (bogdando)
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 5.1

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-08-22:

#13

Deployment part of this bug is tracked here:

https://bugs.launchpad.net/fuel/+bug/1340657

Vladimir Kuklin (vkuklin) on 2014-08-22

no longer affects:

fuel

Dmitry Mescheryakov (dmitrymex) on 2014-08-29

Changed in mos:
assignee:	Alexei Kornienko (alexei-kornienko) → Yuriy Taraday (yorik-sar)

Revision history for this message

Tomasz 'Zen' Napierala (tzn) wrote on 2014-09-02:

#14

If this is worked on, please change status to "in progress" just to keep the situation clear

Dmitry Mescheryakov (dmitrymex) on 2014-09-02

Changed in mos:
status:	Confirmed → In Progress

Revision history for this message

Dmitry Mescheryakov (dmitrymex) wrote on 2014-09-04:

#15

The bug is fixed in MOS by this commit: https://gerrit.mirantis.com/#/c/21408/8

Changed in mos:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-05: Fix proposed to keystone (master)

#16

Fix proposed to branch: master
Review: https://review.openstack.org/119452

Changed in keystone:
assignee:	nobody → Yuriy Taraday (yorik-sar)
status:	Invalid → In Progress

OpenStack Infra (hudson-openstack) on 2014-09-06

Changed in keystone:
assignee:	Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-08: Fix proposed to keystonemiddleware (master)

#17

Fix proposed to branch: master
Review: https://review.openstack.org/119774

Changed in keystonemiddleware:
assignee:	nobody → Yuriy Taraday (yorik-sar)
status:	New → In Progress

OpenStack Infra (hudson-openstack) on 2014-09-09

Changed in keystone:
assignee:	Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)

Dolph Mathews (dolph) on 2014-09-09

Changed in keystone:
milestone:	none → juno-rc1
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-12: Related fix proposed to keystone (master)

#18

Related fix proposed to branch: master
Review: https://review.openstack.org/121166

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-15: Change abandoned on keystone (master)

#19

Change abandoned by Morgan Fainberg (<email address hidden>) on branch: master
Review: https://review.openstack.org/121166
Reason: The parent needs a rebase will regenerate the config there.

OpenStack Infra (hudson-openstack) on 2014-09-15

Changed in keystone:
assignee:	Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)

OpenStack Infra (hudson-openstack) on 2014-09-16

Changed in keystone:
assignee:	Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)

OpenStack Infra (hudson-openstack) on 2014-09-17

Changed in keystone:
assignee:	Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)

OpenStack Infra (hudson-openstack) on 2014-09-17

Changed in keystone:
assignee:	Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)

OpenStack Infra (hudson-openstack) on 2014-09-17

Changed in keystone:
assignee:	Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)

Morgan Fainberg (mdrnstm) on 2014-09-20

Changed in keystonemiddleware:
milestone:	none → 1.2.0
importance:	Undecided → Medium

OpenStack Infra (hudson-openstack) on 2014-09-21

Changed in keystonemiddleware:
assignee:	Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm)

OpenStack Infra (hudson-openstack) on 2014-09-22

Changed in keystone:
assignee:	Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-23: Fix merged to keystone (master)

#20

Reviewed: https://review.openstack.org/119452
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=0010803288748fcd3ce7dba212a54bffe7a61a0c
Submitter: Jenkins
Branch: master

commit 0010803288748fcd3ce7dba212a54bffe7a61a0c
Author: Yuriy Taraday <email address hidden>
Date: Thu Aug 28 14:27:58 2014 +0400

Add a pool of memcached clients

    This patchset adds a pool of memcache clients. This pool allows for reuse of
    a client object, prevents too many client object from being instantiated, and
    maintains proper tracking of dead servers so as to limit delays
    when a server (or all servers) become unavailable.

    The new memcache pool backend is available either by being set as the memcache
    backend or by using keystone.token.persistence.backends.memcache_pool.Token for
    the Token memcache persistence driver.

    [memcache]
    servers = 127.0.0.1:11211
    dead_retry = 300
    socket_timeout = 3
    pool_maxsize = 10
    pool_unused_timeout = 60

    Where:
    - servers - comma-separated list of host:port pairs (was already there);
    - dead_retry - number of seconds memcached server is considered dead
      before it is tried again;
    - socket_timeout - timeout in seconds for every call to a server;
    - pool_maxsize - max total number of open connections in the pool;
    - pool_unused_timeout - number of seconds a connection is held unused in
      the pool before it is closed;

    The new memcache pool backend can be used as the driver for the Keystone
    caching layer. To use it as caching driver, set
    'keystone.cache.memcache_pool' as the value of the [cache]\backend option,
    the other options are the same as above, but with 'memcache_' prefix:

    [cache]
    backend = keystone.cache.memcache_pool
    memcache_servers = 127.0.0.1:11211
    memcache_dead_retry = 300
    memcache_socket_timeout = 3
    memcache_pool_maxsize = 10
    memcache_pool_unused_timeout = 60

    Co-Authored-By: Morgan Fainberg <email address hidden>
    Closes-bug: #1332058
    Closes-bug: #1360446
    Change-Id: I3544894482b30a47fcd4fac8948d03136fd83f14

Reviewed:  https://review.openstack.org/119452
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=0010803288748fcd3ce7dba212a54bffe7a61a0c
Submitter: Jenkins
Branch:    master

commit 0010803288748fcd3ce7dba212a54bffe7a61a0c
Author: Yuriy Taraday <yorik.sar@gmail.com>
Date:   Thu Aug 28 14:27:58 2014 +0400

Add a pool of memcached clients
    
    This patchset adds a pool of memcache clients. This pool allows for reuse of
    a client object, prevents too many client object from being instantiated, and
    maintains proper tracking of dead servers so as to limit delays
    when a server (or all servers) become unavailable.
    
    The new memcache pool backend is available either by being set as the memcache
    backend or by using keystone.token.persistence.backends.memcache_pool.Token for
    the Token memcache persistence driver.
    
    [memcache]
    servers = 127.0.0.1:11211
    dead_retry = 300
    socket_timeout = 3
    pool_maxsize = 10
    pool_unused_timeout = 60
    
    Where:
    - servers - comma-separated list of host:port pairs (was already there);
    - dead_retry - number of seconds memcached server is considered dead
      before it is tried again;
    - socket_timeout - timeout in seconds for every call to a server;
    - pool_maxsize - max total number of open connections in the pool;
    - pool_unused_timeout - number of seconds a connection is held unused in
      the pool before it is closed;
    
    The new memcache pool backend can be used as the driver for the Keystone
    caching layer. To use it as caching driver, set
    'keystone.cache.memcache_pool' as the value of the [cache]\backend option,
    the other options are the same as above, but with 'memcache_' prefix:
    
    [cache]
    backend = keystone.cache.memcache_pool
    memcache_servers = 127.0.0.1:11211
    memcache_dead_retry = 300
    memcache_socket_timeout = 3
    memcache_pool_maxsize = 10
    memcache_pool_unused_timeout = 60
    
    Co-Authored-By: Morgan Fainberg <morgan.fainberg@gmail.com>
    Closes-bug: #1332058
    Closes-bug: #1360446
    Change-Id: I3544894482b30a47fcd4fac8948d03136fd83f14

Changed in keystone:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-25: Fix merged to keystonemiddleware (master)

#21

Reviewed: https://review.openstack.org/119774
Committed: https://git.openstack.org/cgit/openstack/keystonemiddleware/commit/?id=045cddcea2ecefccecbb40d4249b915c3f1faae3
Submitter: Jenkins
Branch: master

commit 045cddcea2ecefccecbb40d4249b915c3f1faae3
Author: Morgan Fainberg <email address hidden>
Date: Sun Sep 21 13:20:35 2014 -0700

Add an optional advanced pool of memcached clients

    This patchset adds an advanced eventlet safe pool of memcache clients. This
    allows the deployer to configure auth_token middleware to utilize the new
    pool by simply setting 'memcache_use_advanced_pool' to true. Optional
    tunables for the memcache pool have also been added.

    Co-Authored-By: Morgan Fainberg <email address hidden>
    Closes-bug: #1332058
    Closes-bug: #1360446
    Change-Id: I08082b46ce692cf4df449d48dac94718f1e98a6c

Changed in keystonemiddleware:
status:	In Progress → Fix Committed

Dolph Mathews (dolph) on 2014-09-25

Changed in keystonemiddleware:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-09-30

Changed in keystone:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in keystone:
milestone:	juno-rc1 → 2014.2

Revision history for this message

OSCI Robot (oscirobot) wrote on 2014-10-31:

#22

RPM package keystone has been built for project openstack/keystone
Package version == 2014.1.1, package release == fuel5.0.3.mira8.git.de075cb.7361afa

Changeset: https://review.fuel-infra.org/561
project: openstack/keystone
branch: openstack-ci/fuel-5.0.3/2014.1.1
author: Alexander Makarov
committer: Alexander Makarov
subject: Update a pool of memcached clients from upstream
status: patchset-created

Files placed on repository:
openstack-keystone-2014.1.1-fuel5.0.3.mira8.git.de075cb.7361afa.noarch.rpm
openstack-keystone-doc-2014.1.1-fuel5.0.3.mira8.git.de075cb.7361afa.noarch.rpm
python-keystone-2014.1.1-fuel5.0.3.mira8.git.de075cb.7361afa.noarch.rpm

NOTE: Changeset is not merged, created temporary package repository.
RPM repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.0.3-stable-561/centos

Revision history for this message

OSCI Robot (oscirobot) wrote on 2014-10-31:

#23

DEB package keystone has been built for project openstack/keystone
Package version == 2014.1.1, package release == fuel5.0.3~mira8+git.de075cb.7361afa

Files placed on repository:
keystone-doc_2014.1.1-fuel5.0.3~mira8+git.de075cb.7361afa_all.deb
keystone_2014.1.1-fuel5.0.3~mira8+git.de075cb.7361afa_all.deb
python-keystone_2014.1.1-fuel5.0.3~mira8+git.de075cb.7361afa_all.deb

NOTE: Changeset is not merged, created temporary package repository.
DEB repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.0.3-stable-561/ubuntu

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-28: Related fix merged to keystone (master)

#24

Reviewed: https://review.opendev.org/737579
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=bb0393623ca8687714342d2b0cc73cc6c126ecde
Submitter: Zuul
Branch: master

commit bb0393623ca8687714342d2b0cc73cc6c126ecde
Author: Lance Bragstad <email address hidden>
Date: Tue Jun 23 11:37:06 2020 -0500

Write a symptom for checking memcache connections

This makes it easier for operators to troubleshoot connection issues to
Memcached.

Related-Bug: 1332058

Change-Id: I6e67363822480314b93608bb1eae3514f1480f6d

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.