Wrong comment tag processing by lxml.html.diff.htmldiff
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Confirmed
|
Medium
|
Unassigned |
Bug Description
The comment tag from html parsed as "tag" like '<built-in function comment>' + some text from tail of comment tag + '</built-in>'.
Simple example:
===
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml.html as h
>>> import lxml.html.diff as d
>>> a=h.fromstring(
>>> d.htmldiff(a,a)
u'<b>text<
>>>
===
I was debug diff.py and locate bug near flatten_el and start/end_tag functions. Changing start_tag return value from "el.tag" to "el.tag if not callable(el.tag) else el.tag" solves tag escaping but not tag-contents order (like ...the comment<!----> other text...)
PS: lxml version - 2.2.2_win32_py2.6
Changed in lxml: | |
importance: | Undecided → Medium |
status: | New → Confirmed |