Ubuntu
software-center package

Bug #744914
Activity log

Activity log for bug #744914

Date	Who	What changed	Old value	New value	Message
2011-03-29 12:24:23	Lucian Adrian Grijincu	bug			added bug
2011-03-29 12:24:53	Lucian Adrian Grijincu	description	Binary package hint: software-center As of now software center uses str.lower() when searching in the xapian db: utils/query.py 22: s = search_term.lower() 33: query = xapian.Query(str_to_prefix[search_prefix]+search_term.lower()) There are two problems with this: * many languages have diacritic marks for characters but for fast typing users usually write the base character: (in Romanian: ăâșțî and ĂÂȘȚÎ are spelled AASTI by some users). * characters in the Unicode set can appear in two forms: composed and decomposed: the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C). To solve both problems both the text entered in the xapian db and the user's text query must be normalized. The search function in Chromium uses ICU rules to achieve this: - http://code.google.com/p/chromium/issues/detail?id=1100 - http://www.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/editing/TextIterator.cpp&q=file:TextIterator.cpp&l=1882 There is a python-icu library that could help achieve this. See for example http://lists.osafoundation.org/pipermail/pyicu-dev/2010-October/000214.html Or one could just remove the diacritical marks from the string altogether: http://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string The is the standard unicodedata.normalize() http://docs.python.org/library/unicodedata.html	Binary package hint: software-center As of now software center uses str.lower() when searching in the xapian db: utils/query.py 22: s = search_term.lower() 33: query = xapian.Query(str_to_prefix[search_prefix]+search_term.lower()) There are two problems with this: * many languages have diacritic marks for characters but for fast typing users usually write the base character: (in Romanian: ăâșțî and ĂÂȘȚÎ are spelled AASTI by some users). * characters in the Unicode set can appear in two forms: composed and decomposed: the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C). To solve both problems both the text entered in the xapian db and the user's text query must be normalized. The search function in Chromium uses ICU rules to achieve this: - http://code.google.com/p/chromium/issues/detail?id=1100 - http://www.google.com/codesearch/p?hl=en#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/editing/TextIterator.cpp&q=file:TextIterator.cpp&l=1882 There is a python-icu library that could help achieve this. See for example http://lists.osafoundation.org/pipermail/pyicu-dev/2010-October/000214.html Or one could just remove the diacritical marks from the string altogether: http://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string
2011-09-27 20:01:29	Kiwinote	tags		db
2011-10-07 13:47:37	Michael Vogt	software-center (Ubuntu): status	New	Confirmed
2011-10-07 13:47:40	Michael Vogt	software-center (Ubuntu): importance	Undecided	Medium
2011-10-07 13:47:52	Michael Vogt	nominated for series		Ubuntu Precise
2011-10-07 13:47:52	Michael Vogt	bug task added		software-center (Ubuntu Precise)
2011-10-07 13:49:00	Michael Vogt	software-center (Ubuntu Precise): status	New	Confirmed
2011-10-07 13:49:01	Michael Vogt	software-center (Ubuntu Precise): importance	Undecided	Medium
2011-10-25 17:06:33	David Planella	bug task added		ubuntu-translations
2011-11-14 17:51:33	Pedro Villavicencio	software-center (Ubuntu Precise): status	Confirmed	Triaged
2021-10-14 05:30:53	Steve Langasek	software-center (Ubuntu Precise): status	Triaged	Won't Fix

Ubuntusoftware-center package

Activity log for bug #744914

Ubuntu
software-center package