Product:+code-index timeouts

Bug #711077 reported by Robert Collins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Jeroen T. Vermeulen

Bug Description

 23 SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Pe ... r, Person.verbose_bugnotifications, Person.visibility FROM Person WHERE Person.id = %s LIMIT $INT:
      3 https://code.launchpad.net/openobject-addons/+code-index (Product:+code-index)
       OOPS-1857C2215, OOPS-1857E2489, OOPS-1857H1932

Branch: launchpad-rev-12274
Revno: 12274
SQL time: 6374 ms
Non-sql time: 9066 ms
Total time: 15440 ms
Statement Count: 132

SQL time is tolerable; non-sql looks pathological, we may need to get a profile from qastaging.

query 71 -
71. 3238 254ms SQL-launchpad-main-slave
SELECT BugBranch.branch,
       BugBranch.bug,
       BugBranch.datecreated,
       BugBranch.id,
       BugBranch.registrant,
       BugBranch.revision_hint
FROM Bug,
     BugBranch
WHERE BugBranch.branch IN (243984,
                           418515,
                           36577,
                           418560,
                           46323,
                           16565,
                           417353,
                           318483,
                           421154,
                           401991,
                           402228,
                           41457,
                           419982,
                           318481,
                           405903,
                           377110,
                           346917,
                           414322,
                           407919,
                           358540,
                           420958,
                           420904,
                           358279,
                           362980,
                           420563,
                           420525,
                           402409,
                           412279,
                           420490,
                           415296,
                           402807,
                           377069,
                           408917,
                           415319,
                           418180,
                           419733,
                           412393,
                           419490,
                           378848,
                           418530,
                           402477,
                           419023,
                           305323,
                           418829,
                           418795,
                           418589,
                           377840,
                           415983,
                           402198,
                           353299,
                           417992,
                           371246,
                           410052,
                           417398,
                           405359,
                           417356,
                           417156,
                           296064,
                           273195,
                           416711,
                           85779,
                           416228,
                           408929,
                           408922,
                           379146,
                           317771,
                           414984,
                           409071,
                           401994,
                           402977,
                           414263,
                           408537,
                           413688,
                           413650,
                           413636,
                           413489,
                           413292,
                           413032,
                           412975,
                           378840,
                           380249,
                           380248,
                           407309,
                           402800,
                           411177,
                           411176,
                           305324,
                           410365,
                           410362,
                           410361,
                           410360,
                           410200,
                           409957,
                           408187)
  AND BugBranch.bug = Bug.id
  AND (Bug.private = FALSE
       OR Bug.id IN
         (SELECT Bug.id
          FROM Bug,
               BugSubscription,
               TeamParticipation
          WHERE Bug.id = BugSubscription.bug
            AND TeamParticipation.person = 3423052
            AND BugSubscription.person = TeamParticipation.team))
  AND (1=1)

has a 5 second gap before the next query. This is a little hard to explain ;)

Might be GIL, might be simple inefficiency.

Tags: qa-ok timeout

Related branches

Revision history for this message
Robert Collins (lifeless) wrote :

Times out when I hit the url in question trivially, so I suspect genuine python rather than temporary overload GIL issues.

description: updated
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

   At least 206 queries/external actions issued in 11.90 seconds - from a successful render; sql time is 6 seconds, so thats nearly 6 in python trivially.

Changed in launchpad:
assignee: nobody → Jeroen T. Vermeulen (jtv)
status: Triaged → In Progress
milestone: none → 11.03
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

All the example oopses here occur at about the same point in the template, branch-listing.pt:

              <tal:known-person condition="branch/revision_author/person">
                <tal:person-link
                    replace="structure branch/revision_author/person/fmt:link"/>
              </tal:known-person>

Two of the three time out while checking for a valid person in ObjectImageDisplayPersonAPI(person)._get_custom_icon_url. Eliding some of these evaluations would help. But they don't seem to be the biggest time sink. The big one happens earlier on:

Each of the three oopses' query log shows a gap of about 5 seconds (!) between the join that fetches Bugs and BugBranches for the batch, and the query that gets its SpecificationBranches. Something big and costly is going on inbetween.

The two queries seem to be coming from adjacent calls to branch_ids_with_bug_links and branch_ids_with_spec_links respectively in BranchListingItemsMixin._createItem.

Only two things happen between those two queries: marshaling, and:

        set(bug_branch.branch.id for bug_branch in bug_branches)

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

A considerable portion of that 5-second gap is actually taken up by the getBugBranchesForBranches query itself. Furthermore, dereferencing bug_branch.branch may be taking too much time even when the branch object is already in the ORM cache.

Instead, we could specialize this query to return only the (distinct) branch ids. That is all that the view is really trying to find out.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

On qastaging the getBugBranchesForBranches query itself times out a lot. I blame disk seeks from capacity misses. Separating the "public bugs" subqueries from the "private bugs visible to this user" subqueries may help for that.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 711077] Re: Product:+code-index timeouts

I suspect we're hitting GIL issues again, I'm seeing multiple
different OOPSes with simillar symptoms; I think getting a ++profile++
from qastaging is probably a key element in the matter: to do so

 - ask losa to enable profiling on qastaging
 - and to use the feature flag to raise the timeout for
Product:+code-index on qastaging to 20 seconds
 - use ++profile++ to get the profile (after hitting it a few times to
get hot cache)
 - revert the settings

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote :

The page that timed out still works on qastaging, and is a little faster I think, but its still doing 196 queries and taking 9 seconds; I suspect this is going to need further work.

tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

On qastaging I get 146 queries in 5.7—6.6 seconds for the problem page from the bug description, once the cache is warm.

Revision history for this message
Robert Collins (lifeless) wrote :

This has been deployed now. First hit on production gives: "178 queries/external actions issued in 11.56 seconds". Second hit gives "178 queries/external actions issued in 8.76 seconds" - so we have some substantial cold cache effects here.

William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.