Improve Possible duplicates by ignoring short words
Bug #219452 reported by
Auzy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ideatorrent |
In Progress
|
Undecided
|
Unassigned |
Bug Description
The possible duplicates function needs fixing STAT. Because everything it finds, is totally not relevent.
One way to improve relevence is by doing what google does, and ignore words which are 3 characters or less, like: the, a, and, an, at for starters.
You could also have a dictionary of words which are too common to be used as keywords. Code wise, easy, just scan through the database and get a list of the top 10 words used, and ignore them. They will probably be words like: a, then, at
affects: | ubuntu-qa-website → ideatorrent |
To post a comment you must log in.
After some check in the current website, this seems already implemented
can someone confirm and review the status accordingly ?