html cleaner assert parent is not None AssertionError
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Triaged
|
Low
|
Unassigned |
Bug Description
The html cleaner raise AssertionError when both root tag and child tag in kill_tags set.
Reproduce:
```
from lxml.html.clean import Cleaner
html_cleaner = Cleaner(
content = '<pre><
print(html_
```
Output:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/lxml/
File "src/lxml/
File "/Users/
assert parent is not None
AssertionError
```
Expected output:
```
<div></div>
```
I tested in Python 3.7.4, lxml-4.3.3 and lxml-4.4.1.
Python : sys.version_
lxml.etree : (4, 4, 1, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)
The bug seems similar to https:/
Hmm, yes, it's probably worth reconsidering that assertion. In the end, what you're asking for is to discard the entire "document". That's a valid thing to do, and it should not run into an AssertionError.
OTOH, what should it return in such a case?
Maybe it could be enough to remove all content from the tag (children and inner text), and then still return it?