proxy-server Error: Second simultaneous read or write detected

Bug #1020127 reported by offline
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Invalid
High
Unassigned
python-keystoneclient
Invalid
Undecided
Unassigned

Bug Description

Hello,

The configuration for caching auth tokens in Swift described at
    https://lists.launchpad.net/openstack/msg08639.html
where
    memcache_servers = cachedev1:11211
is added to the [filter:authtoken] section of /etc/swift/proxy-server.conf was applied to 2 Swift proxy servers. After this a Swift reliability test consisting of 50 threads performing repeated PUT and GET requests started to intermittently receive 500 Internal Server Error responses and the Swift log on the proxy servers reported the following messages, each one corresponding to each 500 response. The messages continue to occur intermittently, sometimes for read and sometimes for write.

Taking this configuration out again and restarting the proxy servers prevents the problem.

The Swift version is 1.4.8 (Essex).

The Swift reliability test is as described in
    https://answers.launchpad.net/swift/+question/201627
but currently rate limited to 1,050 HTTP requests/min.

Jul 2 10:19:38 swiftproxydev2 <131>proxy-server Error: Second simultaneous write on fileno 15 detected. Unless you really know what you're doing, make sure that only one greenthread can write any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_multiple_reader_prevention(False): #012Traceback (most recent call last):#012 File "/usr/lib/python2.6/site-packages/swift/common/middleware/catch_errors.py", line 47, in __call__#012 return self.app(env, my_start_response)#012 File "/usr/lib/python2.6/site-packages/swift/common/middleware/healthcheck.py", line 38, in __call__#012 return self.app(env, start_response)#012 File "/usr/lib/python2.6/site-packages/swift/common/middleware/memcache.py", line 47, in __call__#012 return self.app(env, start_response)#012 File "/usr/lib/python2.6/site-packages/keystone/middleware/auth_token.py", line 173, in __call__#012 token_info = self._validate_user_token(user_token)#012 File "/usr/lib/python2.6/site-packages/keystone/middleware/auth_token.py", line 343, in _validate_user_token#012 cached = self._cache_get(user_token)#012 File "/usr/lib/python2.6/site-packages/keystone/middleware/auth_token.py", line 477, in _cache_get#012 cached = self._cache.get(key)#012 File "/usr/lib/python2.6/site-packages/memcache.py", line 661, in get#012 server.send_cmd("get %s" % key)#012 File "/usr/lib/python2.6/site-packages/memcache.py", line 873, in send_cmd#012 self.socket.sendall(cmd + '\r\n')#012 File "/usr/lib/python2.6/site-packages/eventlet/greenio.py", line 283, in sendall#012 tail = self.send(data, flags)#012 File "/usr/lib/python2.6/site-packages/eventlet/greenio.py", line 278, in send#012 timeout_exc=socket.timeout("timed out"))#012 File "/usr/lib/python2.6/site-packages/eventlet/hubs/__init__.py", line 119, in trampoline#012 listener = hub.add(hub.WRITE, fileno, current.switch)#012 File "/usr/lib/python2.6/site-packages/eventlet/hubs/poll.py", line 27, in add#012 listener = super(

Jul 2 10:24:15 swiftproxydev2 <131>proxy-server Error: Second simultaneous read on fileno 15 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_multiple_reader_prevention(False): #012Traceback (most recent call last):#012 File "/usr/lib/python2.6/site-packages/swift/common/middleware/catch_errors.py", line 47, in __call__#012 return self.app(env, my_start_response)#012 File "/usr/lib/python2.6/site-packages/swift/common/middleware/healthcheck.py", line 38, in __call__#012 return self.app(env, start_response)#012 File "/usr/lib/python2.6/site-packages/swift/common/middleware/memcache.py", line 47, in __call__#012 return self.app(env, start_response)#012 File "/usr/lib/python2.6/site-packages/keystone/middleware/auth_token.py", line 173, in __call__#012 token_info = self._validate_user_token(user_token)#012 File "/usr/lib/python2.6/site-packages/keystone/middleware/auth_token.py", line 343, in _validate_user_token#012 cached = self._cache_get(user_token)#012 File "/usr/lib/python2.6/site-packages/keystone/middleware/auth_token.py", line 477, in _cache_get#012 cached = self._cache.get(key)#012 File "/usr/lib/python2.6/site-packages/memcache.py", line 662, in get#012 rkey, flags, rlen, = self._expectvalue(server)#012 File "/usr/lib/python2.6/site-packages/memcache.py", line 747, in _expectvalue#012 line = server.readline()#012 File "/usr/lib/python2.6/site-packages/memcache.py", line 886, in readline#012 data = recv(4096)#012 File "/usr/lib/python2.6/site-packages/eventlet/greenio.py", line 238, in recv#012 timeout_exc=socket.timeout("timed out"))#012 File "/usr/lib/python2.6/site-packages/eventlet/hubs/__init__.py", line 117, in trampoline#012 listener = hub.add(hub.READ, fileno, current.switch)#012 File "/usr/lib/python2.6/site-packages/eventlet/hubs/poll.py", line 27, in add#012 listener = super(Hub, self).a

A problem producing a similar error in Keystone is discussed at
    https://bugs.launchpad.net/keystone/+bug/1012381
but I don't know if this issue is related.

Revision history for this message
Samuel Merritt (torgomatic) wrote :

It looks like keystone.middleware.auth_token is not Eventlet-safe. Two different greenthreads in the Swift proxy server are trying to use the same connection to memcached.

I'd tentatively categorize this as a bug in Keystone. Swift uses Eventlet, so any middleware that's supposed to be installed in Swift must be Eventlet-safe.

affects: swift → keystone
no longer affects: swift
Revision history for this message
Rafael Durán Castañeda (rafadurancastaneda) wrote :

You can easily test this applying the same patch as bug 1012381, thought on keystone the monkey patching was moved to the keystone-all script shortly after been released, so auth_token_middleware is not monkey patching thread unless the application using it already does (e.g. nova does, swfit don't know).

Revision history for this message
Joseph Heck (heckj) wrote :

We've intentionally not monkey-patched auth_token within that code, expecting the process space that is importing it to have monkey-patched thread to make it eventlet safe.

John - when swift does it's import and setup, does it monkeypatch the WSGI setup prior to loading code?

How can we resolve this?

Changed in keystone:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Samuel Merritt (torgomatic) wrote :

I'd suggest taking a look at Swift's memcache wrapper (swift.common.memcached.MemcacheRing). It's got a pool-based design that prevents two different greenthreads from trying to use the same connection.

Basically, whenever you call s.c.m.MemcacheRing.get() or .set(), it calls into s.c.m.MemcacheRing._get_conns(), which pulls server connections out of the pool and yields them. The key here is that the pool is an attribute of the middleware, so it's visible to all greenthreads, while the connection that gets pulled out of the pool is only held in a local variable, so it's greenthread-local.

I'm not really sure of a good way to let keystone.middleware.auth_token.AuthProtocol take advantage of that, though. It seems like making k.m.a.AuthProtocol have the same sort of Eventlet-safe pooled thingy is needless complexity in the case where you're not using Eventlet.

Maybe k.m.a.AuthProtocol should look in the WSGI environment at a known value (e.g. env['cache_connection']) for something that has .get() and .set() on it, and fall back to making its own memcache connections if it's absent. That would let it use Swift's Eventlet-safe thing when in a Swift proxy server, but do something simpler in other cases.

Disclaimer: the above may be up to 85% bogus.

Revision history for this message
Alex Yang (alexyang) wrote :

I am agree with Samuel Merritt. I also encountered this kind of problem in use, so I use the MemcacheRing of swift to replace the python-memcache client. I also move the auth_token and swift_auth middleware to swift project. It works well and easy to setup.

Changed in keystone:
assignee: nobody → Sina Web Service Dev (sws)
Alex Yang (alexyang)
Changed in keystone:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to keystone (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12356

Changed in keystone:
assignee: Sina Web Service Dev (sws) → Alex Yang (alexyang)
Revision history for this message
Joseph Heck (heckj) wrote :

Totally good with the idea - memcachering seems pretty sensible in code, but I'd prefer to move that to openstack-common and then use it from a library instead of doing the copy-paste thing where we end up with code in two places.

Revision history for this message
Alex Yang (alexyang) wrote :

Hi, Joseph.
It meas that I should do followings?
https://bugs.launchpad.net/keystone/+bug/1039567
https://bugs.launchpad.net/keystone/+bug/1020127

Do this changes all in openstack-common?

Revision history for this message
David Kranz (david-kranz) wrote :

This is hurting pretty badly. Is there a tested workaround?

Revision history for this message
Joseph Heck (heckj) wrote :

AlexYang - any progress on putting the memcachering code into openstack common?

David or Samuel,

Do you have any suggestions on how to verify this code? The details of eventlet and high performance code are beyond what I'm normally working in, so right now I have no way to verify/reproduce this issue to even see if it's resolved with Samuel's proposed path to a solution.

David - with your deployment, are you using the auth_token code directly, or the swift_auth_middleware? (i.e. if we fix auth_token, now in Keystoneclient, will that even benefit you?)

Revision history for this message
Samuel Merritt (torgomatic) wrote :

It's a pretty standard race condition; you can test the fix by having multiple concurrent requests to an authenticated URL.

In Swift-land, that's easy, as every non-public request goes off to check if the proffered auth token exists in memcache. Something as simple as HEAD /v1/some-account is quick to process and so you can load up the concurrency pretty easily.

If you don't have such a lightweight request handy, you could reorder your middleware pipeline so that the health-check middleware comes after the auth, and then spam the server with GET /health requests.

You can also widen the window by adding latency: put your app on a VM in London and your memcached on a VM in Tokyo.

Revision history for this message
David Kranz (david-kranz) wrote :

Since auth_token was moved to keystoneclient I guess this bug belongs there. I wish there were an approved resolution for this issue.

Revision history for this message
Samuel Merritt (torgomatic) wrote :

I think there is one now. Something about setting cache = swift.cache in your config file, or something like that. It makes the middleware use Swift's cache thing out of the WSGI environment.

Sorry I don't know the details, but I hope my fuzzy half-remembered thoughts at least provide you some Google fodder.

Revision history for this message
David Kranz (david-kranz) wrote :

Thanks, Sam. That was very helpful. The comment in the keystoneclient auth_token code about this option is wrong, I think, in that it suggests that you could set this option. As far as I can see, you *must* set this option if you are using auth_token in a swift proxy server. The swift install documentation needs to be updated (I'll file a bug) and at this point I think this ticket could be closed.

Adam Young (ayoung)
Changed in python-keystoneclient:
status: New → Fix Released
Xin Xu (xuxin-b)
Changed in keystone:
assignee: Alex Yang (alexyang) → nobody
Revision history for this message
Adam Young (ayoung) wrote :

No fix necessary in Keystone server

Changed in keystone:
status: In Progress → Fix Released
Revision history for this message
Dolph Mathews (dolph) wrote :

"Fix Released" indicates that a trackable change was made to the project and has been included in at least a milestone release.

Changed in keystone:
status: Fix Released → Invalid
Changed in python-keystoneclient:
status: Fix Released → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.