Hunspell 1.2.8 Groups Thai TIS-620 Chars in Lower/Upper Case Pairs

Bug #910452 reported by Richard Wordingham
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
hunspell (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Ubuntu release:
Description: Ubuntu 10.04.3 LTS
Release: 10.04
Package: 1.2.8-6ubuntu1

Casing information for ISO-8859-1 is applied to dictionaries encoded in TIS-620. This was fixed in Release 1.2.14 of Hunspell, by adding elements such as {"tis620", tis620_tbl} to the array encds[] in file csutil.cxx. A minimal change on Release 1.2.8 to correct the problem would be to add entries such as {"TIS620", iscii_devanagari_tbl} and possibly {"TIS620-2533", iscii_devanagari_tbl}.

Revision history for this message
Richard Wordingham (richard-wordingham) wrote :

I forgot to report the effects of the bug.

The effect of the bug can be demonstrated using spelling dictionary th_TH.dic from myspell-th Version 1:3.2.0-3ubuntu3.1 but with th_TH.aff modified by correcting 'SET TIS620-2533' to 'SET TIS620' (http://bugs.launchpad.net/ubuntu/+source/openoffice.org-dictionaries/+bug/910447 refers). Without this change, the corrections of สะกัด to สกัด and หณา to หมา are not offered - running with locale set by LANG=en_GB.utf8 the suggestion lines for input are are

& สะกัด 4 0: สะกิด, สะกดทัพ, สะกด, สะบัด
& อไร 4 16: อุไร, อะไร, ขอบไร, ฤร้
& หณา 4 26: อาณา, อุณา, สกุณา, ยฆษณา

when the program is run using (echo '\!'; echo '-'; echo สะกัด อไร หณา)| hunspell -d th_TH

When encds[] is corrected as suggested above, the suggestion lines become

& สะกัด 6 0: สะกด, สกัด, สะกิด, สะบัด, สังกัด, สะดวก
& อไร 4 16: อุไร, อะไร, อมร, อรไท
& หณา 5 26: หา, ห่า, หรา, หมา, หนา

 Note that the corrections of สะกัด to สกัด and หณา to หมา are then offered. Additionally, the non-existent words ฤร้ and ยฆษณา are no longer offered.

Revision history for this message
Richard Wordingham (richard-wordingham) wrote :

FWIW, a suitably modified csutils.cxx can also be found in http://homepage.ntlworld.com/richard.wordingham/thai/hunspell-1.2.8-jrw1.1.zip , along with corrections for the other issues I have had in getting Hunspell to spell check word-separated Thai.

Revision history for this message
Richard Wordingham (richard-wordingham) wrote :

Formally, this is fixed for Lucid Lynx by upgrading to 1.3.2-2~lucid1. (Lucid upgrade appears to be today, apparently connected with change of OpenOffice from 3.3.2 to 3.4.5.) Unfortunately, the 'n-gram' selection criteria of Hunspell 1.3.2 reduce the suggestions to:

& สะกัด 2 0: สะกด, สกัด
& อไร 1 16: อุไร
& หณา 2 26: หา, อาณา

(No change to th_TH.aff is needed for it to be used by Version 1.3.2-2~lucid1.)

Implementing my suggestions for th_TH.aff (https://bugs.launchpad.net/ubuntu/+source/openoffice.org-dictionaries/+bug/910447) yields the suggestion list:

& สะกัด 4 0: สะกด, สกัด, สะกิด, สะบัด
& อไร 3 16: อะไร, อุไร, อมร
& หณา 6 26: หา, ห่า, หนา, หมา, หรา, อาณา

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in hunspell (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.