UnicodeDecodeError on Mac OS X with Python 3.10
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
Python: 3.10.5
lxml: 4.9.1
OS: Mac OS X
When parsed text contains an emoji character a UnicodeDecodeError is raised. This only happens with Python 3.10 on OS X, not on Linux nor Windows.
Full traceback:
Python 3.10.4 (main, May 18 2022, 22:24:47) [Clang 13.0.0 (clang-
Type "help", "copyright", "credits" or "license" for more information.
>>> import readtime
>>> result = readtime.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/users/
return utils.read_
File "/Users/
text, images = parse_html(el)
File "/Users/
add_text(tag, no_tail=True)
File "/Users/
if tag. text and not isinstance(tag, lxml.etree.
File "src/lxml/
File "src/lxml/
File "src/lxml/
UnicodeDecodeError: 'utf-8' codec can't decode_byte Oxf4 in position 9: unexpected end of data
Originally reported at https:/ /github. com/alanhamlett /readtime/ issues/ 7#issuecomment- 1179449557