Intranet Search research and analysis console script

Bug #1338273 reported by Paul Everitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KARL3
Fix Released
Medium
Chris Rossi

Bug Description

We have an SOW for the second half of the year to provide an "Intranet Search". That is, a search option that only returns the following:

1) Content under /offices
2) Content in communities that have a special marker
3) Profiles

This work is for two goals:

a. Better signal-to-noise ratio when using KARL for "business" content
b. Faster search performance by limiting content to a much smaller subset

The first step of this work is a research project to measure the impact and set a baseline we can use for a before/after comparison. I'd like a console script that can be run outside the app server to make it more predictable. (Any other ideas to remove external influences are welcome.) The console script should do two things:

a. Time a current LiveSearch (prefix multigroup) across the whole database, then record the response time on this ticket.
b. Give a count of the current number of content resources in the entire database, versus a count of resources under /offices and /profiles (plus the communities Nat has in mind.)

This ticket is up for better ideas and thinking, so feel free to give feedback.

Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

I've attached a script for getting live search timing information. You might have a better idea of which searches are most interesting to you. Feel free to tweak. The output looks something like:

karlstaging@karlstaging10 ~/staging/current $ bin/karlserve debug osf -S ~crossi/intranet-analysis/time_searches.py
Searching /?val=paul
Found 30
Elapsed: 5.46 s
Searching /?val=sta%2A
Found 30
Elapsed: 36.46 s
Searching /?val=c%2A
Found 30
Elapsed: 89.75 s
karlstaging@karlstaging10 ~/staging/current $ bin/karlserve debug osf -S ~crossi/intranet-analysis/time_searches.py
Searching /?val=paul
Found 30
Elapsed: 4.47 s
Searching /?val=sta%2A
Found 30
Elapsed: 5.50 s
Searching /?val=c%2A
Found 30
Elapsed: 18.39 s

Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

One thing I notice is that doing a particular search the first time is much slower than doing the same search immediately afterwards. Since our console script is a different process each time, we can't really ascribe this to the ZODB cache. It stands to reason that likely priming the disk cache on the database server is where we get the most speedup. Storing the database on SSD and increasing the RAM available on the database server should both increase performance significantly.

Another thing I notice is just being logged into karlstaging, the machine itself feels very sluggish compared to what I'm used to. I wonder if we could install Karl and copy production data over to, say, a Digital Ocean droplet and see what that does to performance. Although a large enough instance for OSF would be kind of expensive, you only pay for the amount of time you actually have it up, so a short lived experiment just to guage performance would be fairly economical, even for a monster instance.

Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

Here's output from this morning for three searches run back to back:

karlstaging@karlstaging10 ~/staging/current $ bin/karlserve debug osf -S ~crossi/search-analysis/time_searches.py
/?val=paul 19.04 s
/?val=sta%2A 31.57 s
/?val=c%2A 29.15 s

karlstaging@karlstaging10 ~/staging/current $ bin/karlserve debug osf -S ~crossi/search-analysis/time_searches.py
/?val=paul 4.28 s
/?val=sta%2A 4.70 s
/?val=c%2A 17.00 s

Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

And here's some document counts:

karlstaging@karlstaging10 ~/staging/current $ bin/karlserve debug osf -S ~crossi/search-analysis/count_documents.py
Total documents: 336712
In /profiles 10401
In /offices 25607

Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :
Changed in karl3:
status: New → Fix Committed
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

All the research work is done.

Changed in karl3:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.