Comment 23 for bug 1493303

Revision history for this message
Samuel Merritt (torgomatic) wrote : Re: Swift proxy memory leak on unfinished read

This is weird. I was able to reproduce the problem; the proxy server accumulates a bunch of open filehandles for dead sockets.

I can see with strace that the sockets in question are used for client <--> proxy communication. We're not leaking connections to the storage backends. Also, it looks like someone is trying to clean them up, but is just bad at it. Check this out:

    # we have a GET request from the client; the socket is fd 220
    accept(4, {sa_family=AF_INET, sin_port=htons(38280), sin_addr=inet_addr("")}, [16]) = 220
    fcntl(220, F_GETFL) = 0x2 (flags O_RDWR)
    fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
    fcntl(220, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
    fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
    sendto(3, "<139>proxy-server: STDERR: (2834"..., 65, 0, NULL, 0) = 65
    accept(4, 0x7fffa0e778c0, [16]) = -1 EAGAIN (Resource temporarily unavailable)
    recvfrom(220, "GET /v1/AUTH_test/test/largefile"..., 65536, 0, NULL, NULL) = 160

    # ... removed stuff talking to memcached + storage backends

    # then we try to send something and learn that it's shut down, so SIGPIPE
    sendto(220, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 0, NULL, 0) = -1 EPIPE (Broken pipe)
    --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28347, si_uid=1000} ---
    # stupidly, we try to send it again ang get another SIGPIPE
    sendto(220, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 0, NULL, 0) = -1 EPIPE (Broken pipe)
    --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28347, si_uid=1000} ---

    # attempted cleanup...
    shutdown(220, SHUT_RDWR) = -1 ENOTCONN (Transport endpoint is not connected)

    # and now we completely forget about fd 220 and it's never mentioned again. there's the leak.
    poll([{fd=4, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 13) = 1 ([{fd=4, revents=POLLIN}])
    # ...thousands more lines not mentioning fd 220

What's weird here is that eventlet is doing this at least somewhat intentionally. Here's an annotated eventlet.wsgi.HttpProtocol.finish():

    def finish(self):
            # this tries to flush any buffers; this is probably what
            # causes the second sendto()/SIGPIPE pair, but I have not
            # verified that.
        except socket.error as e:
            # Broken pipe, connection reset by peer
            if support.get_errno(e) not in BROKEN_SOCK:
        # This is responsible for the shutdown call. It executes every time;
        # the try/except above doesn't let any exceptions out that would
        # exit this method early.

        # Here's the fun part: this method call gets executed, but it
        # doesn't make any syscalls. Something is causing this to be a no-op,
        # but I don't know what it is.

Looks like it might be a bug in eventlet, maybe? I'm not convinced this bug is fixable from within Swift.