'delete noreply' may hang the client

Bug #442914 reported by Dmitry Isaykin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libmemcached
Fix Released
Undecided
Trond Norbye

Bug Description

I use libmemcached v0.30 in noreply mode. And find that bug: in case
of deleting of a large amount of keys from memcached my production
server start waiting for a while on read().
This behaviour is reproducing in the test environment each time but i
can't reproduce it in the libmemcached unit tests.
Version v0.31 ans 0.32 have the same behaviour.
If I use libmemcached in usual mode (noreply is off) this thing does
not happened.

(gdb) bt
#0 0x00007f0f9ca8758b in read () from /lib/libpthread.so.0
#1 0x00000000006808e3 in memcached_io_read (ptr=0x4e2b5720,
buffer=0x7f0f9bd73b60, length=1, nread=0x7f0f9bd73a80) at
memcached_io.c:108
#2 0x0000000000681490 in memcached_io_readline (ptr=0x4e2b5720,
buffer_ptr=0x7f0f9bd73b60 "\1", size=1024) at memcached_io.c:482
#3 0x000000000068330d in textual_read_one_response (ptr=0x4e2b5720,
buffer=0x7f0f9bd73b60 "\1", buffer_length=1024, result=0x7f0f9bd73f60)
   at memcached_response.c:208
#4 0x0000000000682ccb in memcached_read_one_response (ptr=0x4e2b5720,
buffer=0x7f0f9bd73b60 "\1", buffer_length=1024, result=0x7f0f9bd73f60)
   at memcached_response.c:31
#5 0x0000000000681841 in memcached_purge (ptr=0x4e2b5720) at
memcached_purge.c:56
#6 0x00000000006810a3 in io_flush (ptr=0x4e2b5720,
error=0x7f0f9bd74188) at memcached_io.c:340
#7 0x0000000000680c87 in memcached_io_write (ptr=0x4e2b5720,
buffer=0x7f0f9bd74270, length=0, with_flush=1 '\1') at
memcached_io.c:241
#8 0x000000000067e040 in memcached_do (ptr=0x4e2b5720,
command=0x7f0f9bd74270, command_length=28, with_flush=1 '\1') at
memcached_do.c:26
#9 0x000000000067dc7e in memcached_delete_by_key (ptr=0x4e2b5490,
master_key=0x7f0f9bd744b0 "1:206384:48", master_key_length=11,
key=0x7f0f9bd744b0 "1:206384:48",
   key_length=11, expiration=0) at memcached_delete.c:71
#10 0x000000000067d970 in memcached_delete (ptr=0x4e2b5490,
key=0x7f0f9bd744b0 "1:206384:48", key_length=11, expiration=0) at
memcached_delete.c:6

(gdb) up 5
#5 0x0000000000681841 in memcached_purge (ptr=0x4e2b5720) at
memcached_purge.c:56
56 result_ptr);
Current language: auto; currently c
(gdb) l memcached_purge.c:35
30 }
31 WATCHPOINT_ASSERT(ptr->fd != -1);
32
33 uint32_t no_msg= memcached_server_response_count(ptr) - 1;
34 if (no_msg > 0)
35 {
36 memcached_result_st result;
37 memcached_result_st *result_ptr;
38 char buffer[SMALL_STRING_LEN];
39
(gdb) l memcached_purge.c:56
51 for (x= 0; x < no_msg; x++)
52 {
53 memcached_result_reset(result_ptr);
54 memcached_return rc= memcached_read_one_response(ptr, buffer,
55 sizeof (buffer),
56 result_ptr);
57 /*
58 * Purge doesn't care for what kind of command results that
is received.
59 * The only kind of errors I care about if is I'm out of sync with the
60 * protocol or have problems reading data from the network..
(gdb) p no_msg
$5 = 4294967295
(gdb) p (uint)-1
$6 = 4294967295

memcached_delete() try purge pending responses but there is nothing
(i.e. memcached_server_response_count == 0).

--- libmemcached-0.30/libmemcached/memcached_purge.c 2009-05-20
22:08:20.000000000 +0400
+++ libmemcached-0.30-my/libmemcached/memcached_purge.c 2009-07-10
19:42:08.000000000 +0400
@@ -30,8 +30,8 @@
  }
  WATCHPOINT_ASSERT(ptr->fd != -1);

- uint32_t no_msg= memcached_server_response_count(ptr) - 1;
- if (no_msg > 0)
+ uint32_t no_msg= memcached_server_response_count(ptr);
+ if (no_msg-- > 1)
  {
    memcached_result_st result;
    memcached_result_st *result_ptr;

Related branches

Revision history for this message
Trond Norbye (trond-norbye) wrote :

Hmm.. this is strange.. I tried to add the following test-case to recreate the problem:

static test_return regression_bug_442914(memcached_st *memc)
{
  memcached_return rc;
  rc= memcached_behavior_set(memc, MEMCACHED_BEHAVIOR_BUFFER_REQUESTS, 1);
  assert(rc == MEMCACHED_SUCCESS);
  rc= memcached_behavior_set(memc, MEMCACHED_BEHAVIOR_NOREPLY, 1);
  assert(rc == MEMCACHED_SUCCESS);

  for (int x= 0; x < 20000; ++x)
  {
     char k[251];
     size_t len= snprintf(k, sizeof(k), "%u", x);
     rc= memcached_delete(memc, k, len, 0);
     assert(rc == MEMCACHED_SUCCESS || rc == MEMCACHED_BUFFERED);
  }

  return TEST_SUCCESS;
}

But the test succeeds with or without buffered mode...

Changed in libmemcached:
assignee: nobody → Trond Norbye (trond-norbye)
Revision history for this message
Dmitry Isaykin (dmitry-isaikin) wrote :

I succeeded in writing of unit test for this case. It hangs on my machine. Please try on your one.

Revision history for this message
Trond Norbye (trond-norbye) wrote :

Your testcase recreated the situation for me, so I was able to fix the problem. Thanks for the test case!

I debugged your testcase and found out that it is a bug in memcached_purge that it doesn't handle the case where ptr->io_bytes_sent == ptr->root->io_bytes_watermark.

I rewore my test case to generate that situation instead of applying your testcase, because your testcase didn't use the memcached_st provided by the test framework, but instead you hard-coded two servers in there.

Changed in libmemcached:
status: New → Fix Committed
Revision history for this message
Dmitry Isaykin (dmitry-isaikin) wrote :

Ok. Thanks.

Revision history for this message
Trond Norbye (trond-norbye) wrote :

Released in revno: 595 [merge]

Changed in libmemcached:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.