find_event() queries with timerange other than TimeRange.always() are slow

Bug #672965 reported by Markus Korn
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
Fix Released
Low
Markus Korn
0.7
Fix Released
Low
Markus Korn

Bug Description

When running the attached script you can see that when giving a timerange which does not start at 0 and end at maxint the queries get 300% slower.

python sample_timerange_query.py
DEBUG:zeitgeist.sql:Got query:
QUERY:
SELECT DISTINCT id FROM event_view GROUP BY actor ORDER BY COUNT(actor) DESC, timestamp DESC LIMIT 6 ([])
PLAN:
[0, 0, u'TABLE event WITH INDEX event_actor ORDER BY']

DEBUG:zeitgeist.engine:Found 6 event IDs in 0.080176s
---> MostPopularActor: get len(ids)=6 using .find_eventids() in 0.080375s
DEBUG:zeitgeist.sql:Got query:
QUERY:
SELECT DISTINCT id FROM event_view WHERE (timestamp >= ? AND timestamp <= ?) GROUP BY actor ORDER BY COUNT(actor) DESC, timestamp DESC LIMIT 6 ([u'1', u'50000'])
PLAN:
[0, 0, u'TABLE event WITH INDEX event_timestamp']

DEBUG:zeitgeist.engine:Found 6 event IDs in 0.260648s
---> MostPopularActor: get len(ids)=6 using .find_eventids() in 0.260838s

The reason is simply that the second query uses the 'wrong' index.

Related branches

Revision history for this message
Markus Korn (thekorn) wrote :
Markus Korn (thekorn)
description: updated
Markus Korn (thekorn)
Changed in zeitgeist:
status: New → In Progress
assignee: nobody → Markus Korn (thekorn)
importance: Undecided → Low
milestone: none → 0.7.0
Revision history for this message
Seif Lotfy (seif) wrote :

Markus I ran the your script several times on the current state of trunk

DEBUG:zeitgeist.sql:Got query:
QUERY:
SELECT DISTINCT id FROM event_view WHERE (+timestamp <= ?) GROUP BY actor ORDER BY COUNT(actor) DESC, timestamp DESC LIMIT 6 ([9223372036854775807L])
PLAN:
[0, 0, u'TABLE event WITH INDEX event_actor ORDER BY']

DEBUG:zeitgeist.engine:Found 6 event IDs in 0.091614s
---> MostPopularActor: get len(ids)=6 using .find_eventids() in 0.091833s
DEBUG:zeitgeist.sql:Got query:
QUERY:
SELECT DISTINCT id FROM event_view WHERE (+timestamp >= ? AND +timestamp <= ?) GROUP BY actor ORDER BY COUNT(actor) DESC, timestamp DESC LIMIT 6 ([1, 50000])
PLAN:
[0, 0, u'TABLE event WITH INDEX event_actor ORDER BY']

DEBUG:zeitgeist.engine:Found 6 event IDs in 0.092641s
---> MostPopularActor: get len(ids)=6 using .find_eventids() in 0.092846s

While still a bit slow for my taste but results are very near to each other and I think you solved this issue with your last commit. Although still slower I think this bug is then fixed in a way. What do you think?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.