Incorrect parsing of _Element.text with comment tag(s)

Bug #1939133 reported by Michael Mortensen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

When calling .text on an _Element object, the text returned is only the partial text, if a comment tag is present within the text itself (valid XML). The full text, minus the comment block, was expected to be returned.

lxml information:

Python : sys.version_info(major=3, minor=9, micro=2, releaselevel='final', serial=0)
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

Example POC:

from lxml import etree

x = '''<a>a,
  <!-- b, -->
  c
</a>'''

x2 = '''<a>a,
  b,
  c,
  <!-- d, -->
  e
</a>'''

el = etree.XML(x)
expected = 'a,\n c\n'
actual = el.text # 'a,\n '

el2 = etree.XML(x2)
expected2 = 'a,\n b,\n c,\n e\n'
actual2 = el2.text # 'a,\n b,\n c,\n '

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.