Incorrect parsing of _Element.text with comment tag(s)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
When calling .text on an _Element object, the text returned is only the partial text, if a comment tag is present within the text itself (valid XML). The full text, minus the comment block, was expected to be returned.
lxml information:
Python : sys.version_
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)
Example POC:
from lxml import etree
x = '''<a>a,
<!-- b, -->
c
</a>'''
x2 = '''<a>a,
b,
c,
<!-- d, -->
e
</a>'''
el = etree.XML(x)
expected = 'a,\n c\n'
actual = el.text # 'a,\n '
el2 = etree.XML(x2)
expected2 = 'a,\n b,\n c,\n e\n'
actual2 = el2.text # 'a,\n b,\n c,\n '