keystone behavior when one memcache backend is down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Fix Committed
|
Critical
|
Yuriy Taraday | ||
OpenStack Identity (keystone) |
Fix Released
|
Medium
|
Yuriy Taraday | ||
keystonemiddleware |
Fix Released
|
Medium
|
Morgan Fainberg |
Bug Description
Hi,
Our implementation uses dogpile.
Test connection using
for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep real | awk '{print $2}'; done
Block one memcache backend using
iptables -I INPUT -p tcp --dport 11211 -j DROP (Simulation power outage of node)
Test the speed using
for i in {1..20}; do (time keystone token-get >> log2) 2>&1 | grep real | awk '{print $2}'; done
Also I straced keystone process with
strace -tt -s 512 -o /root/log1 -f -p PID
and got
26872 connect(9, {sa_family=AF_INET, sin_port=
though this IP is down
Also I checked the code
https:/
https:/
https:/
and was not able to find any piece of details how keystone treats with backend when it's down
There should be a logic which temporarily blocks backend when it's not accessible. After timeout period, backend should be probed (but not blocking get/set operations of current backends) and if connection is successful it should be added back to operation. Here is a sample how it could be implemented
http://
tags: | added: ha |
Changed in mos: | |
assignee: | MOS Keystone (mos-keystone) → Yuriy Taraday (yorik-sar) |
Changed in mos: | |
assignee: | Yuriy Taraday (yorik-sar) → Alexei Kornienko (alexei-kornienko) |
Changed in mos: | |
importance: | High → Critical |
no longer affects: | fuel |
Changed in mos: | |
assignee: | Alexei Kornienko (alexei-kornienko) → Yuriy Taraday (yorik-sar) |
Changed in mos: | |
status: | Confirmed → In Progress |
Changed in keystone: | |
assignee: | Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm) |
Changed in keystone: | |
assignee: | Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar) |
Changed in keystone: | |
milestone: | none → juno-rc1 |
importance: | Undecided → Medium |
Changed in keystone: | |
assignee: | Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm) |
Changed in keystone: | |
assignee: | Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar) |
Changed in keystone: | |
assignee: | Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm) |
Changed in keystone: | |
assignee: | Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar) |
Changed in keystone: | |
assignee: | Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm) |
Changed in keystonemiddleware: | |
milestone: | none → 1.2.0 |
importance: | Undecided → Medium |
Changed in keystonemiddleware: | |
assignee: | Yuriy Taraday (yorik-sar) → Morgan Fainberg (mdrnstm) |
Changed in keystone: | |
assignee: | Morgan Fainberg (mdrnstm) → Yuriy Taraday (yorik-sar) |
Changed in keystonemiddleware: | |
status: | Fix Committed → Fix Released |
Changed in keystone: | |
status: | Fix Committed → Fix Released |
Changed in keystone: | |
milestone: | juno-rc1 → 2014.2 |
This behavior is the way the python memcache clients themselves work. This isn't specific to dogpile, keystone, or anything else.
The basic behavior is 'try and wait for a timeout'. Not sure what the best solution to this will be in the short-term. In the long term, the real solution will be non-persistent (no need to store) the tokens, which would eliminate the need for memcache in this regard.