UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 58: invalid start byte
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
I try to parse a document with lxml.etree.
Traceback (most recent call last):
File "lxml.etree.pyx", line 815, in lxml.etree.
File "apihelpers.pxi", line 616, in lxml.etree.
File "apihelpers.pxi", line 1280, in lxml.etree.funicode (src/lxml/
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 58: invalid start byte
However, when I manually get the problematic text and decode it from utf8, I get the unicode character u'\ufffd'.
So, I guess lxml.etree.
Thank you very much.
Python : (2, 6, 6, 'final', 0)
lxml.etree : (2, 2, 6, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 6)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)
I wonder how you get that character parsed anyway. Could you add a code
example that shows how you parse and how you get to the exception?
Stefan