Search could find alternate spellings
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Triaged
|
Low
|
Unassigned |
Bug Description
When doing web searches in Launchpad, the system is not tolerant for errors.
A single wrong letter means total failure. For example searching for "inkscaep" on Ubuntu package list returns no results. This is not usable.
A simple way to get around typos and such is to calculate the levenshtein distance and select the minimum.
I have attached a sample code to demonstrate. It contains a list of all Ubuntu packages and a simple matcher. On my machines all queries are instant.
Some examples.
"inchcape" matches to "inkscape"
"openorifice.org" matches to "openoffice.org"
"pthon" matches to "python"
All of Launchpad's queries, like packages, users, projects and so on, should have this kind of error correction.
affects: | launchpad → launchpad-registry |
affects: | launchpad-registry → launchpad-foundations |
tags: | added: search |
summary: |
- Improvements to Launchpad web searches + Search could find alternate spellings |
This is an interesting proposal and part of our overall need to fix
our search story.
A few thoughts:
- 'instant' isn't really all that precise a metric. Here are some stats:
- we have 60K unique product/package names
- bringing them all back in just psql is about 200ms - it will be
substantially more on an appserver due to networking and serialisation
overheasds
- that overhead will be per request, + the calculation time per term.
So to make an acceptable overhead - say 500ms - we'd need the total
time to select an appropriate term in a 5 term search to be
(ballparking) under 10ms on a 60K corpus. Actual tests with an
appserver would be needed to be confident that this works well
enough.
This doesn't mean we can't do it, but it may mean that rather than a
simple approach we need some indexed.
-Rob