german text detecion does work very bad
Bug #677608 reported by
benste
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ocrfeeder (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: ocrfeeder
here are some examples which letters are replaced wrongly while doing OCR for the attached file
should be - detected
h as x
o as 0 or c
ö,ä,ü do not exist
ur as m
ü as ii
ä as é
ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: ocrfeeder 0.6.6-3
ProcVersionSign
Uname: Linux 2.6.35-22-generic x86_64
NonfreeKernelMo
Architecture: amd64
Date: Fri Nov 19 20:08:03 2010
EcryptfsInUse: Yes
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
PackageArchitec
ProcEnviron:
LANGUAGE=
LANG=de_DE.utf8
SHELL=/bin/bash
SourcePackage: ocrfeeder
To post a comment you must log in.
What OCR engine did you used for this? Did you add -l option to the settings (in case of German it should be "-l deu" for Tesseract and "-l grm" for CuneiForm)? I think this was your problem.
I've just made extensive changes to the OCR help page, read more about language handling there:
https:/ /help.ubuntu. com/community/ OCR#OCRFeeder