Large multigets with binary protocol may hang client

Bug #434843 reported by Adam Thomason
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
libmemcached
Fix Released
Undecided
Trond Norbye

Bug Description

(from http://lists.tangent.org/pipermail/libmemcached/2009-August/000918.html)

There's a problem with how memcached_purge attempts to flush the output buffer and read in responses which leads to hangs on read(2) calls. Fundamentally the issue is that memcached_purge expects to find a response for each request, when in general it has no way of knowing whether to a response will be forthcoming for quiet commands. This occurs at least in the situations where MEMCACHED_BEHAVIOR_BUFFER_REQUESTS is set or when doing binary multigets (which buffer getq commands before sending a noop command).

Consider the multiget case: assume a multiget for 1,000 keys from a particular server; that only 100 getq commands fit in the 8K output buffer;
and that 10 of those 100 are cache misses. Only 90 responses are sent, and on memcached_purge's 91st call to memcached_read_one_response, the read() call blocks (forever, unless a timeout has been set).

Related branches

Revision history for this message
Trond Norbye (trond-norbye) wrote :

This is caused by the fact that we count the GETKQ commands, but we cannot expect a return value from them. We should only count the NOOP command, and wait for that.

Changed in libmemcached:
status: New → Confirmed
assignee: nobody → Trond Norbye (trond-norbye)
status: Confirmed → Fix Committed
Revision history for this message
Trond Norbye (trond-norbye) wrote :

Released in revno: 591 [merge]

Changed in libmemcached:
status: Fix Committed → Fix Released
Revision history for this message
Trond Norbye (trond-norbye) wrote :

The fix didn't work on some of the platforms :(

Changed in libmemcached:
status: Fix Released → In Progress
Revision history for this message
Trond Norbye (trond-norbye) wrote :

It seems to work on all platforms with a mget of 1024 keys.. for really large multigets you should be using memcached_mget_execute instead.

Changed in libmemcached:
status: In Progress → Fix Released
Revision history for this message
Adam Thomason (athomason) wrote :

Appears to work for me too.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.