fonts fail when non-UTF8 filenames exist anywhere

Bug #1802183 reported by hackerb9
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre (Ubuntu)
New
Undecided
Unassigned

Bug Description

When editing an ebook or setting the font embedding preferences, Calibre cannot find any fonts at all if there exists a filename anywhere in the font directory that contains text that is invalid UTF-8.

This can happen if the font designer's computer was in a different LOCALE than the calibre user. For example, I had downloaded a font designed by a man in Russia who had used Cyrillic encoding for the names of some image files in a directory adjacent to the font.

The bug is easily repeatable. Create a file in any of your font directories or sub-directories with an invalid sequence. For example, from the command line, run:

    mkdir -p ~/.fonts/foo
    touch ~/.fonts/foo/$'fred\377juki'

When you run Calibre and try to add a font, after selecting a font file, it will give you a dialog box saying, "ERROR: Unhandled Exception. UnicodeDecodeError:'utf8' codec can't decode byte 0xff in position 4: invalid start byte".

It will also print on stderr a message similar to this:

calibre, version 3.21.0
ERROR: Unhandled exception: <b>UnicodeDecodeError</b>:'utf8' codec can't decode byte 0xff in position 4: invalid start byte

calibre 3.21 embedded-python: False is64bit: True
Linux-4.15.0-38-generic-x86_64-with-Ubuntu-18.04-bionic Linux ('64bit', '')
('Linux', '4.15.0-38-generic', '#41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018')
Python 2.7.15rc1
Linux: ('Ubuntu', '18.04', 'bionic')
Interface language: None
Successfully initialized third party plugins: Gather KFX-ZIP (from KFX Input) (1, 9, 0) && DeDRM (6, 6, 1) && Package KFX (from KFX Input) (1, 9, 0) && KFX metadata reader (from KFX Input) (1, 9, 0) && KFX Input (1, 9, 0)
Traceback (most recent call last):
  File "/usr/lib/calibre/calibre/gui2/font_family_chooser.py", line 299, in add_fonts
    self.font_scanner.do_scan()
  File "/usr/lib/calibre/calibre/utils/fonts/scanner.py", line 327, in do_scan
    files = tuple(walk(folder))
  File "/usr/lib/calibre/calibre/__init__.py", line 523, in walk
    for record in os.walk(dir):
  File "/usr/lib/python2.7/os.py", line 286, in walk
    if isdir(join(top, name)):
  File "/usr/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 4: invalid start byte

Ideally, calibre would ignore such files and continue on. However, if that is difficult, it'd be good to catch the error and show a message telling people something along the lines of,

    Non-UTF-8 filename found somewhere in your font directories, causing Calibre to barf.
    Please find and rename the file so that Calibre can read your fonts. You can find the
    offending file like so: find (...PRINT CALIBRE FONT DIRECTORIES HERE...) | iconv

Revision history for this message
hackerb9 (hackerb9) wrote :
Revision history for this message
hackerb9 (hackerb9) wrote :

Script started on 2018-11-07 09:47:45-0800

$ ls Sumkin\ type\!/Sumkin\ Cover
'Sumkin by MRfrukta!.png'
'Sumkin Russian type by MRfrukta!.png'
''$'\346''Ҽ'$'\254\277\241'' '$'\336''ӿ'$'\365\324'' (1).png'
''$'\346''Ҽ'$'\254\277\241'' '$'\336''ӿ'$'\365\324'' (2).png'
''$'\346''Ҽ'$'\254\277\241'' '$'\336''ӿ'$'\365\324'' (3).png'
''$'\346''Ҽ'$'\254\277\241'' '$'\336''ӿ'$'\365\324'' (4).png'
''$'\346''Ҽ'$'\254\277\241'' '$'\336''ӿ'$'\365\324'' (5).png'

$ ls Sumkin\ type\!/Sumkin\ Cover | iconv -f cyrillic
Sumkin by MRfrukta!.png
Sumkin Russian type by MRfrukta!.png
цвМЌПЁ огПѕд (1).png
цвМЌПЁ огПѕд (2).png
цвМЌПЁ огПѕд (3).png
цвМЌПЁ огПѕд (4).png
цвМЌПЁ огПѕд (5).png

$ exit

Script done on 2018-11-07 09:48:14-0800

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.