Launchpad itself

interrupted pg connections can hang appservers in PQgetResult()

Bug #931161 reported by Robert Collins on 2012-02-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Triaged	High	Unassigned

Bug Description

We had a cross-DC firewall incident in the weekend. In the aftermath some appservers had gone off into lala land.

The backtraces were very similar: 2 active threads, both looking like:
Thread 3
#0 0x00002b8e9e7a3f93 in poll () from None
#1 0x00002b8ea4aa184f in ?? () from None
#2 0x00002b8ea4aa18d0 in ?? () from None
#3 0x00002b8ea4aa0a89 in PQgetResult () from None
#4 0x00002b8ea4aa0d28 in ?? () from None
#5 0x00002b8ea4870553 in pq_execute (curs=0xd9b94d8, query=0x14488714 "\
", ' ' <repeats 12 times>, "UPDATE SessionData SET last_accessed = CURRENT_TIMESTAMP\
", ' ' <repeats 12 times>, "WHERE client_id = E'6BPI4Wcg59P77Pi1ILzViTD.3WwbM-OpGPhVDtMn2iGzeUXnkH13Lw'\
", ' ' <repeats 16 times>, "AND last_accessed < CURREN"..., async=0) from psycopg/pqpath.c
#6 0x00002b8ea4876c8c in _psyco_curs_execute (self=0xd9b94d8, operation=<value optimised out>, vars=<value optimised out>, async=0) from psycopg/cursor_type.c
#7 0x00002b8ea48774bd in psyco_curs_execute (self=0xd9b94d8, args=<value optimised out>, kwargs=<value optimised out>) from psycopg/cursor_type.c
#8 0x00000000004a7a07 in ext_do_call () from ../Python/ceval.c

Thread 2
#0 0x00002b8e9e7a3f93 in poll () from None
#1 0x00002b8ea4aa184f in ?? () from None
#2 0x00002b8ea4aa18d0 in ?? () from None
#3 0x00002b8ea4aa0a89 in PQgetResult () from None
#4 0x00002b8ea4aa0d28 in ?? () from None
#5 0x00002b8ea4870553 in pq_execute (curs=0xedff308, query=0x118950d4 "SELECT BugMessage.bug, BugMessage.bugwatch, BugMessage.id, BugMessage.index, BugMessage.message, BugMessage.owner, BugMessage.remote_comment_id, Message.datecreated, Message.id, Message.owner, Message"..., async=0) from psycopg/pqpath.c
#6 0x00002b8ea4876c8c in _psyco_curs_execute (self=0xedff308, operation=<value optimised out>, vars=<value optimised out>, async=0) from psycopg/cursor_type.c
#7 0x00002b8ea48774bd in psyco_curs_execute (self=0xedff308, args=<value optimised out>, kwargs=<value optimised out>) from psycopg/cursor_type.c
#8 0x00000000004a7a07 in ext_do_call () from ../Python/ceval.c

The backtrace was captured 24 hours after haproxy took the servers out of rotation, so these are not 'active' requests but rather stuck threads.

Tags:

Revision history for this message

Robert Collins (lifeless) wrote on 2012-02-12:

From James troup - keepalive settings for the host:
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.