ideatorrent

Improve Possible duplicates by ignoring short words

Bug #219452 reported by Auzy on 2008-04-19

Affects		Status	Importance	Assigned to	Milestone
	ideatorrent	In Progress	Undecided	Unassigned

Bug Description

The possible duplicates function needs fixing STAT. Because everything it finds, is totally not relevent.

One way to improve relevence is by doing what google does, and ignore words which are 3 characters or less, like: the, a, and, an, at for starters.

You could also have a dictionary of words which are too common to be used as keywords. Code wise, easy, just scan through the database and get a list of the top 10 words used, and ignore them. They will probably be words like: a, then, at

Tags:

Revision history for this message

Marc_dm (mdemaillard) wrote on 2009-01-18:

After some check in the current website, this seems already implemented
can someone confirm and review the status accordingly ?

Changed in ubuntu-qa-website:
status:	New → In Progress

Stéphane Graber (stgraber) on 2012-03-22

affects:

ubuntu-qa-website → ideatorrent

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #219448

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.