en_US dictionary misses n't contractions
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
scowl (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Despite their presence in the `.dic` file, `hunspell` breaks some contractions at ' (ASCII apostrophe) or ’ (Unicode apostrophe) and rejects resulting non-words as misspellings.
```ShellSession
luism@lmm-
Description: Ubuntu 18.10
Release: 18.10
luism@lmm-
Listing... Done
hunspell-
hunspell/cosmic,now 1.6.2-1build1 amd64 [installed]
luism@lmm-
SEARCH PATH:
.::/usr/
AVAILABLE DICTIONARIES (path is not mandatory for -d option):
/usr/share/
LOADED DICTIONARY:
/usr/share/
/usr/share/
Hunspell 1.6.2
luism@lmm-
> do sed -ne /${i}n\'t/'{p;q}' /usr/share/
> done
aren't
couldn't
didn't
isn't
mustn't
shouldn't
wasn't
weren't
wouldn't
luism@lmm-
> do hunspell <<EOF
> ${i}n't
> EOF
> done
Hunspell 1.6.2
& aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, Wren, are n
*
Hunspell 1.6.2
& couldn 2 0: could, could n
*
Hunspell 1.6.2
& didn 4 0: did, din, dido, did n
*
Hunspell 1.6.2
& isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
*
Hunspell 1.6.2
& mustn 6 0: must, musts, musty, mus tn, mus-tn, must n
*
Hunspell 1.6.2
& shouldn 2 0: should, should n
*
Hunspell 1.6.2
& wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n
*
Hunspell 1.6.2
& weren 5 0: were, ween, wren, were n, wen
*
Hunspell 1.6.2
& wouldn 3 0: would, woulds, would n
*
luism@lmm-
> do hunspell <<EOF
> ${i}n’t
> EOF
> done
Hunspell 1.6.2
& aren 12 0: earn, are, arena, Daren, Yaren, Karen, ares, area, amen, wren, Wren, are n
*
Hunspell 1.6.2
& couldn 2 0: could, could n
*
Hunspell 1.6.2
& didn 4 0: did, din, dido, did n
*
Hunspell 1.6.2
& isn 9 0: sin, ins, ism, is, in, inn, ion, isl, is n
*
Hunspell 1.6.2
& mustn 6 0: must, musts, musty, mus tn, mus-tn, must n
*
Hunspell 1.6.2
& shouldn 2 0: should, should n
*
Hunspell 1.6.2
& wasn 10 0: awns, was, wan, swan, wain, warn, wast, wasp, wash, was n
*
Hunspell 1.6.2
& weren 5 0: were, ween, wren, were n, wen
*
Hunspell 1.6.2
& wouldn 3 0: would, woulds, would n
*
```
According to the [`hunspell` changelog](https:/
> 2014-05-28 Németh László <nemeth at numbertext dot org>:
…
> * better apostrophe usage:
> - WORDCHARS only with one of the Unicode or ASCII apostrophe
> results extended word tokenization: both of them will be part of
> the words (if they are inside: eg. word's, but not words').
> - convert Unicode apostrophes to ASCII ones for 8-bit dictionaries
> (eg. English dictionaries), or for UTF-8 dictionaries only
> with ASCII apostrophe supports (eg. French dictionaries).
Therefore, I raise the issue here, since dictionary's affix rules don't appear to support the hunspell feature.
The en_US dictionary (and others) should allow hunspell to process words containing ' without breaking them.