Comment 4 for bug 5417

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 5417] Find accented forms when searching (e.g. Carlos Perelló Marín with "perello")

Björn Tillenius wrote:
> Public bug report changed:
> https://launchpad.net/malone/bugs/5417
>
> Comment:
> On Wed, Dec 07, 2005 at 02:03:07AM -0000, Stuart Bishop wrote:
>
>>We already have code to do the deaccentification -
>>canonical.encoding.ascii_smash() handles the European latin based character
>>sets. Your still stuffed with character sets that don't have an ASCII
>>equivalent, such as Coptic, Greek or most of the Asian languages.
>
>
> ascii_smash() doesn't do exactly what I would expect, though. For
> example, it transforms my name, 'Björn', into 'Bjoern' instead of
> 'Bjorn'. If people would try to find me, they would most likely search
> for either 'Björn' or 'Bjorn'.

It is supposed to be doing a fairly 'standard' transliteration, although I'm
sure that not all European languages do this the same way. I don't know if
it would be a good idea to tweak the mapping to do some sort of a hybrid,
where Björn maps to Bjorn and Åiste maps to Aiste, but Ægean maps to AEgean
and Straße maps to Strasse.

I don't know what the 'correct' mapping would be, but we can tweak it easily
enough.

--
Stuart Bishop <email address hidden> http://www.canonical.com/
Canonical Ltd. http://www.ubuntu.com/