The Google search timeout code isn't working

Bug #376560 reported by Francis J. Lacoste
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Unassigned

Bug Description

Google had a network outage today where it was very slow to respond. This caused outage on lpnet because it seems that requests to them weren't timed out as they should have been.

OOPS-1230A901 GoogleResponseError: The response was incomplete, no xml.

Changed in launchpad-foundations:
importance: Undecided → High
milestone: none → 2.2.6
status: New → Triaged
tags: added: oops
description: updated
Changed in launchpad-foundations:
milestone: 2.2.6 → 2.2.7
Revision history for this message
Ursula Junque (ursinha) wrote :

Moving to next milestone so 2.2.7 can be closed.

Changed in launchpad-foundations:
milestone: 2.2.7 → 2.2.8
Gary Poster (gary)
Changed in launchpad-foundations:
milestone: 2.2.8 → 3.0
assignee: nobody → Stuart Bishop (stub)
Curtis Hovey (sinzui)
Changed in launchpad-foundations:
milestone: 3.0 → 3.1.11
Stuart Bishop (stub)
Changed in launchpad-foundations:
assignee: Stuart Bishop (stub) → nobody
milestone: 3.1.11 → 3.1.13
Gary Poster (gary)
Changed in launchpad-foundations:
milestone: 10.01 → 10.03
Gary Poster (gary)
Changed in launchpad-foundations:
assignee: nobody → Gary Poster (gary)
Gary Poster (gary)
Changed in launchpad-foundations:
milestone: 10.03 → 10.04
Revision history for this message
Robert Collins (lifeless) wrote :

We see this occuring as timeouts these days.

tags: added: timeout
removed: oops
Revision history for this message
Robert Collins (lifeless) wrote :

I've added this to the request timeline; if it happens again we'll at least get more data.

The timeout code has two odd things to it, for me:
 - its using threads to timeout socket operations (a little weird, risky in our environment [easy to misuse if someone passes db objects across threads due to zope thread-based contexts])
 - the thread join, when there is a cleanup, has no timeout, so can hang around forever.

I suspect is the thread join causing the issue, and that moving this code to use socket timeouts would avoid the problem.

tags: added: pg83
Stuart Bishop (stub)
tags: removed: pg83
Changed in launchpad:
importance: High → Critical
Revision history for this message
Robert Collins (lifeless) wrote :

Removed stale assignee

Changed in launchpad:
assignee: Gary Poster (gary) → nobody
Revision history for this message
Robert Collins (lifeless) wrote :

I'm going to close this: we've added instrumentation, not seen any recurrences, and its not an active oops in any of our current reports. If it happens again we'll definitely treat it seriously but this bug seems unactionable as it stands.

Changed in launchpad:
status: Triaged → Fix Released
Revision history for this message
Robert Collins (lifeless) wrote :

(That is, I'm treating the steps we've taken so far as sufficient).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.