find_event() queries with timerange other than TimeRange.always() are slow
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zeitgeist Framework |
Fix Released
|
Low
|
Markus Korn | ||
0.7 |
Fix Released
|
Low
|
Markus Korn |
Bug Description
When running the attached script you can see that when giving a timerange which does not start at 0 and end at maxint the queries get 300% slower.
python sample_
DEBUG:zeitgeist
QUERY:
SELECT DISTINCT id FROM event_view GROUP BY actor ORDER BY COUNT(actor) DESC, timestamp DESC LIMIT 6 ([])
PLAN:
[0, 0, u'TABLE event WITH INDEX event_actor ORDER BY']
DEBUG:zeitgeist
---> MostPopularActor: get len(ids)=6 using .find_eventids() in 0.080375s
DEBUG:zeitgeist
QUERY:
SELECT DISTINCT id FROM event_view WHERE (timestamp >= ? AND timestamp <= ?) GROUP BY actor ORDER BY COUNT(actor) DESC, timestamp DESC LIMIT 6 ([u'1', u'50000'])
PLAN:
[0, 0, u'TABLE event WITH INDEX event_timestamp']
DEBUG:zeitgeist
---> MostPopularActor: get len(ids)=6 using .find_eventids() in 0.260838s
The reason is simply that the second query uses the 'wrong' index.
Related branches
- Mikkel Kamstrup Erlandsen: Approve
-
Diff: 36 lines (+9/-2)2 files modified_zeitgeist/engine/main.py (+7/-2)
_zeitgeist/engine/sql.py (+2/-0)
description: | updated |
Changed in zeitgeist: | |
status: | New → In Progress |
assignee: | nobody → Markus Korn (thekorn) |
importance: | Undecided → Low |
milestone: | none → 0.7.0 |
Markus I ran the your script several times on the current state of trunk
DEBUG:zeitgeist .sql:Got query: 775807L] )
QUERY:
SELECT DISTINCT id FROM event_view WHERE (+timestamp <= ?) GROUP BY actor ORDER BY COUNT(actor) DESC, timestamp DESC LIMIT 6 ([9223372036854
PLAN:
[0, 0, u'TABLE event WITH INDEX event_actor ORDER BY']
DEBUG:zeitgeist .engine: Found 6 event IDs in 0.091614s .sql:Got query:
---> MostPopularActor: get len(ids)=6 using .find_eventids() in 0.091833s
DEBUG:zeitgeist
QUERY:
SELECT DISTINCT id FROM event_view WHERE (+timestamp >= ? AND +timestamp <= ?) GROUP BY actor ORDER BY COUNT(actor) DESC, timestamp DESC LIMIT 6 ([1, 50000])
PLAN:
[0, 0, u'TABLE event WITH INDEX event_actor ORDER BY']
DEBUG:zeitgeist .engine: Found 6 event IDs in 0.092641s
---> MostPopularActor: get len(ids)=6 using .find_eventids() in 0.092846s
While still a bit slow for my taste but results are very near to each other and I think you solved this issue with your last commit. Although still slower I think this bug is then fixed in a way. What do you think?