I've encountered similar behaviour on MacOS 11.7 (Big Sur) when parsing an example UTF-8 encoded HTML file that contains at least two multibyte characters.
One detail learned while attempting to narrow down the cause: the problem disappears when the 'lxml' dependency is installed from binary wheel.
I've encountered similar behaviour on MacOS 11.7 (Big Sur) when parsing an example UTF-8 encoded HTML file that contains at least two multibyte characters.
One detail learned while attempting to narrow down the cause: the problem disappears when the 'lxml' dependency is installed from binary wheel.
A near-minimal repro case is available at https:/ /github. com/jayaddison/ macos-lxml- issue-repro. git/