Improve Possible duplicates by ignoring short words

Bug #219452 reported by Auzy
8
Affects Status Importance Assigned to Milestone
ideatorrent
In Progress
Undecided
Unassigned

Bug Description

The possible duplicates function needs fixing STAT. Because everything it finds, is totally not relevent.

One way to improve relevence is by doing what google does, and ignore words which are 3 characters or less, like: the, a, and, an, at for starters.

You could also have a dictionary of words which are too common to be used as keywords. Code wise, easy, just scan through the database and get a list of the top 10 words used, and ignore them. They will probably be words like: a, then, at

Tags: qa-poll
Revision history for this message
Marc_dm (mdemaillard) wrote :

After some check in the current website, this seems already implemented
can someone confirm and review the status accordingly ?

Changed in ubuntu-qa-website:
status: New → In Progress
affects: ubuntu-qa-website → ideatorrent
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.