interrupted pg connections can hang appservers in PQgetResult()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Triaged
|
High
|
Unassigned |
Bug Description
We had a cross-DC firewall incident in the weekend. In the aftermath some appservers had gone off into lala land.
The backtraces were very similar: 2 active threads, both looking like:
Thread 3
#0 0x00002b8e9e7a3f93 in poll () from None
#1 0x00002b8ea4aa184f in ?? () from None
#2 0x00002b8ea4aa18d0 in ?? () from None
#3 0x00002b8ea4aa0a89 in PQgetResult () from None
#4 0x00002b8ea4aa0d28 in ?? () from None
#5 0x00002b8ea4870553 in pq_execute (curs=0xd9b94d8, query=0x14488714 "\
", ' ' <repeats 12 times>, "UPDATE SessionData SET last_accessed = CURRENT_TIMESTAMP\
", ' ' <repeats 12 times>, "WHERE client_id = E'6BPI4Wcg59P77
", ' ' <repeats 16 times>, "AND last_accessed < CURREN"..., async=0) from psycopg/pqpath.c
#6 0x00002b8ea4876c8c in _psyco_curs_execute (self=0xd9b94d8, operation=<value optimised out>, vars=<value optimised out>, async=0) from psycopg/
#7 0x00002b8ea48774bd in psyco_curs_execute (self=0xd9b94d8, args=<value optimised out>, kwargs=<value optimised out>) from psycopg/
#8 0x00000000004a7a07 in ext_do_call () from ../Python/ceval.c
Thread 2
#0 0x00002b8e9e7a3f93 in poll () from None
#1 0x00002b8ea4aa184f in ?? () from None
#2 0x00002b8ea4aa18d0 in ?? () from None
#3 0x00002b8ea4aa0a89 in PQgetResult () from None
#4 0x00002b8ea4aa0d28 in ?? () from None
#5 0x00002b8ea4870553 in pq_execute (curs=0xedff308, query=0x118950d4 "SELECT BugMessage.bug, BugMessage.
#6 0x00002b8ea4876c8c in _psyco_curs_execute (self=0xedff308, operation=<value optimised out>, vars=<value optimised out>, async=0) from psycopg/
#7 0x00002b8ea48774bd in psyco_curs_execute (self=0xedff308, args=<value optimised out>, kwargs=<value optimised out>) from psycopg/
#8 0x00000000004a7a07 in ext_do_call () from ../Python/ceval.c
The backtrace was captured 24 hours after haproxy took the servers out of rotation, so these are not 'active' requests but rather stuck threads.
From James troup - keepalive settings for the host: tcp_keepalive_ time = 7200 tcp_keepalive_ probes = 9 tcp_keepalive_ intvl = 75
net.ipv4.
net.ipv4.
net.ipv4.