Comment 5 for bug 5417

Revision history for this message
Björn Tillenius (bjornt) wrote : Re: [Bug 5417] Find accented forms when searching (e.g. Carlos Perelló Marín with "perello")

On Wed, Dec 07, 2005 at 08:48:08AM -0000, Stuart Bishop wrote:
> Public bug report changed:
> https://launchpad.net/malone/bugs/5417
>
> Comment:
> Björn Tillenius wrote:
> > Public bug report changed:
> > https://launchpad.net/malone/bugs/5417
> >
> > Comment:
> > On Wed, Dec 07, 2005 at 02:03:07AM -0000, Stuart Bishop wrote:
> >
> >>We already have code to do the deaccentification -
> >>canonical.encoding.ascii_smash() handles the European latin based character
> >>sets. Your still stuffed with character sets that don't have an ASCII
> >>equivalent, such as Coptic, Greek or most of the Asian languages.
> >
> >
> > ascii_smash() doesn't do exactly what I would expect, though. For
> > example, it transforms my name, 'Björn', into 'Bjoern' instead of
> > 'Bjorn'. If people would try to find me, they would most likely search
> > for either 'Björn' or 'Bjorn'.
>
> It is supposed to be doing a fairly 'standard' transliteration, although I'm
> sure that not all European languages do this the same way. I don't know if
> it would be a good idea to tweak the mapping to do some sort of a hybrid,
> where Björn maps to Bjorn and Åiste maps to Aiste, but Ægean maps to AEgean
> and Straße maps to Strasse.

I think we need to tweak the mapping a bit. The current mapping is used
mostly for official use, for example by shipping companies, banks, and
in passports. (although the current mapping doesn't transform 'å' to
'aa' which is also common for this kind of ascii smash). As a
comparison, Google seems to map 'oe' only to 'ø', not to 'ö'.

> I don't know what the 'correct' mapping would be, but we can tweak it easily
> enough.

Yeah, there probably isn't a 'correct' mapping that fits all.