2014-09-17 04:09:59 |
Alexander Ignatov |
bug |
|
|
added bug |
2014-09-17 04:14:06 |
Alexander Ignatov |
mos: status |
New |
Confirmed |
|
2014-09-17 04:14:08 |
Alexander Ignatov |
mos: milestone |
|
6.0 |
|
2014-09-17 04:14:32 |
Alexander Ignatov |
tags |
|
keystone |
|
2014-09-17 04:14:50 |
Alexander Ignatov |
mos: assignee |
|
Roman Podoliaka (rpodolyaka) |
|
2014-09-17 04:16:22 |
Alexander Ignatov |
tags |
keystone |
keystone memcached |
|
2014-09-17 17:58:34 |
Roman Podoliaka |
summary |
"keystone tenant-list" hangs sometimes |
Keystone hangs trying to set a lock in Memcache |
|
2014-09-17 18:15:11 |
Roman Podoliaka |
description |
due to incorrect logic in python-memcache, keystone tries to write data into a dead memcache backend, skipping alive backends.
That happens because backend traverse logic is randomized and can potentially miss alive servers in the pool.
When most servers of the pool are dead, the probability of failure is relatively high.
Practically that issue shows up during deployment, when keystone is used in the environment, where some controllers have not been deployed yet.
The issue is a heizenbug and it depends on the randomly generated data. |
Preconditions:
1. Keystone is configured to use Memcache
backend=keystone.cache.memcache_pool
backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6
backend_argument=pool_maxsize:100
2. Memcached is deployed on each of 3 controllers.
3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down)
Result: the keystone API hangs when a user do something like "keystone tenant-list"
Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will iterate over unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/ |
|
2014-09-17 18:16:51 |
Roman Podoliaka |
description |
Preconditions:
1. Keystone is configured to use Memcache
backend=keystone.cache.memcache_pool
backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6
backend_argument=pool_maxsize:100
2. Memcached is deployed on each of 3 controllers.
3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down)
Result: the keystone API hangs when a user do something like "keystone tenant-list"
Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will iterate over unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/ |
Preconditions:
1. Keystone is configured to use Memcache
backend=keystone.cache.memcache_pool
backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6
backend_argument=pool_maxsize:100
2. Memcached is deployed on each of 3 controllers.
3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down)
Result: the keystone API hangs when a user do something like "keystone tenant-list"
Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield *only* unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/ |
|
2014-09-17 18:18:03 |
Roman Podoliaka |
mos: assignee |
Roman Podoliaka (rpodolyaka) |
|
|
2014-09-17 18:25:58 |
Roman Podoliaka |
mos: status |
Confirmed |
Triaged |
|
2014-09-17 21:19:26 |
Alexander Ignatov |
mos: status |
Triaged |
In Progress |
|
2014-09-18 16:58:43 |
Bogdan Dobrelya |
nominated for series |
|
mos/5.1.x |
|
2014-09-18 16:58:43 |
Bogdan Dobrelya |
bug task added |
|
mos/5.1.x |
|
2014-09-18 16:59:03 |
Bogdan Dobrelya |
nominated for series |
|
mos/6.0.x |
|
2014-09-18 16:59:03 |
Bogdan Dobrelya |
bug task added |
|
mos/6.0.x |
|
2014-09-18 16:59:16 |
Bogdan Dobrelya |
mos/5.1.x: milestone |
6.0 |
5.1.1 |
|
2014-09-18 16:59:23 |
Bogdan Dobrelya |
mos/6.0.x: status |
New |
Confirmed |
|
2014-09-18 16:59:30 |
Bogdan Dobrelya |
mos/6.0.x: importance |
Undecided |
High |
|
2014-09-18 17:00:08 |
Bogdan Dobrelya |
mos/5.1.x: status |
In Progress |
Confirmed |
|
2014-09-18 17:00:18 |
Bogdan Dobrelya |
mos/6.0.x: milestone |
|
6.0 |
|
2014-09-18 17:58:49 |
Roman Podoliaka |
description |
Preconditions:
1. Keystone is configured to use Memcache
backend=keystone.cache.memcache_pool
backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6
backend_argument=pool_maxsize:100
2. Memcached is deployed on each of 3 controllers.
3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down)
Result: the keystone API hangs when a user do something like "keystone tenant-list"
Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield *only* unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/ |
Preconditions:
1. Keystone is configured to use Memcache
backend=keystone.cache.memcache_pool
backend_argument=url:10.108.12.3,10.108.12.5,10.108.12.6
backend_argument=pool_maxsize:100
2. Memcached is deployed on each of 3 controllers.
3. 2 of 3 memcached servers are down (only the one on the primary controller is up: 10.108.12.3 is up, 10.108.12.5 and 10.108.12.6 are down)
Result: the keystone API hangs when a user do something like "keystone tenant-list". haproxy will drop the connection after 60s timeout.
strace shows that keystone tries to connect to unavailable servers in a loop, ignoring the available one.
Debug showed that keystone-all process is stuck while trying to set a lock: http://xsnippet.org/360179/ . The lock itself is implemented by setting a key in the memcache. The problem here is that python-memcache shards the keys among the configured memcached instances in a way, so that it *can* possibly fail to find an available server. This is due to how the retry logic is implemented: https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L381-L396 . For a particular key (e.g. _lockusertokens-ee7e5f7374a8488bb2087e106a8834f7) 10 attempts won't be enough, and the loop within _get_server() will yield *only* unavailable memcache servers. This can be easily reproduced by http://xsnippet.org/360181/ |
|
2014-09-18 18:30:29 |
Dmitry Mescheryakov |
nominated for series |
|
mos/5.0.x |
|
2014-09-18 18:30:29 |
Dmitry Mescheryakov |
bug task added |
|
mos/5.0.x |
|
2014-09-18 18:30:33 |
Dmitry Mescheryakov |
mos/5.0.x: status |
New |
Incomplete |
|
2014-09-18 18:30:34 |
Dmitry Mescheryakov |
mos/5.0.x: status |
Incomplete |
Confirmed |
|
2014-09-18 18:30:36 |
Dmitry Mescheryakov |
mos/5.0.x: importance |
Undecided |
High |
|
2014-09-18 18:30:39 |
Dmitry Mescheryakov |
mos/5.0.x: milestone |
|
5.0.3 |
|
2014-09-22 23:46:00 |
Bogdan Dobrelya |
mos/5.0.x: assignee |
|
MOS Keystone (mos-keystone) |
|
2014-09-22 23:46:09 |
Bogdan Dobrelya |
mos/5.1.x: assignee |
|
MOS Keystone (mos-keystone) |
|
2014-09-22 23:46:17 |
Bogdan Dobrelya |
mos/6.0.x: assignee |
|
MOS Keystone (mos-keystone) |
|
2014-09-24 10:33:53 |
Alexander Makarov |
mos/5.0.x: assignee |
MOS Keystone (mos-keystone) |
Alexander Makarov (amakarov) |
|
2014-09-24 10:33:56 |
Alexander Makarov |
mos/5.1.x: assignee |
MOS Keystone (mos-keystone) |
Alexander Makarov (amakarov) |
|
2014-09-24 10:33:59 |
Alexander Makarov |
mos/5.0.x: assignee |
Alexander Makarov (amakarov) |
|
|
2014-09-24 10:34:06 |
Alexander Makarov |
mos/5.0.x: assignee |
|
Alexander Makarov (amakarov) |
|
2014-09-24 10:34:09 |
Alexander Makarov |
mos/6.0.x: assignee |
MOS Keystone (mos-keystone) |
Alexander Makarov (amakarov) |
|
2014-09-25 15:04:30 |
Alexander Makarov |
attachment added |
|
get_server_fix.patch https://bugs.launchpad.net/mos/+bug/1370324/+attachment/4214980/+files/get_server_fix.patch |
|
2014-11-13 17:00:00 |
Alexander Makarov |
mos/5.0.x: status |
Confirmed |
Fix Committed |
|
2014-11-13 17:00:02 |
Alexander Makarov |
mos/5.1.x: status |
Confirmed |
Fix Committed |
|
2014-11-13 17:00:10 |
Alexander Makarov |
mos/6.0.x: status |
Confirmed |
Fix Committed |
|