Here is the valgrind output, using libxslt 1.1.32, libxml2 2.9.8 and CPython 3.7.0:
==20079== 1 errors in context 1 of 50:
==20079== Invalid free() / delete / delete[] / realloc()
==20079== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20079== by 0x6F377FE: xmlFreeNodeList (tree.c:3721)
==20079== by 0x6F3788C: xmlFreeNodeList (tree.c:3692)
==20079== by 0x6F3788C: xmlFreeNodeList (tree.c:3692)
==20079== by 0x6F37583: xmlFreeDoc (tree.c:1253)
==20079== by 0x6415957: __pyx_pf_4lxml_5etree_9_Document___dealloc__ (etree.c:51785)
==20079== by 0x64157B0: __pyx_pw_4lxml_5etree_9_Document_1__dealloc__ (etree.c:51765)
==20079== by 0x6744E65: __pyx_tp_dealloc_4lxml_5etree__Document (etree.c:224844)
==20079== by 0x67457E8: __pyx_tp_dealloc_4lxml_5etree__Element (etree.c:225159)
==20079== by 0x674667F: __pyx_tp_dealloc_4lxml_5etree__ElementTree (etree.c:226052)
==20079== by 0x1FE1CA: tupledealloc (tupleobject.c:246)
==20079== by 0x1688F4: call_function (ceval.c:4615)
[...]
==20079== Address 0x887dd00 is 0 bytes inside a block of size 120 free'd
==20079== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20079== by 0x6AB2744: xsltApplyStripSpaces (transform.c:5732)
==20079== by 0x6AB3733: xsltApplyStylesheetInternal (transform.c:6011)
==20079== by 0x66D4E1F: __pyx_f_4lxml_5etree_4XSLT__run_transform (etree.c:200006)
==20079== by 0x66CE938: __pyx_pf_4lxml_5etree_4XSLT_18__call__ (etree.c:198792)
==20079== by 0x66CC3EF: __pyx_pw_4lxml_5etree_4XSLT_19__call__ (etree.c:198352)
==20079== by 0x18999E: _PyObject_FastCallKeywords (call.c:199)
==20079== by 0x16A617: call_function (ceval.c:4605)
[...]
==20079== Block was alloc'd at
==20079== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20079== by 0x70154AE: xmlSAX2TextNode (SAX2.c:1863)
==20079== by 0x70181BC: xmlSAX2Characters (SAX2.c:2557)
==20079== by 0x6F1C1B3: xmlParseCharData (parser.c:4457)
==20079== by 0x6F29AB6: xmlParseContent (parser.c:9862)
==20079== by 0x6F2A492: xmlParseElement (parser.c:10014)
==20079== by 0x6F29B5A: xmlParseContent (parser.c:9846)
==20079== by 0x6F2A492: xmlParseElement (parser.c:10014)
==20079== by 0x6F2AC1A: xmlParseDocument (parser.c:10711)
==20079== by 0x6F323A0: xmlDoRead (parser.c:15191)
==20079== by 0x6F323A0: xmlCtxtReadFile (parser.c:15436)
==20079== by 0x6558BCB: __pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFile (etree.c:122932)
[...]
It shows that libxslt frees text nodes in xsltApplyStripSpaces(), which are then freed again by xmlFreeDoc() later. Meaning, somehow, they still reside in the document, although they have been freed. libxslt clearly corrupts the tree state here, which then leads to a crash when lxml discards the input document.
These nodes are created by the parser in libxml2, freed by the XSLT processor in libxslt, and then freed again by the document disposal in libxml2. All of this is outside of the control of lxml. Honestly, I cannot see what lxml could do to prevent this. It cannot even safely warn about XSLTs that strip whitespace, because that can even be triggered by transitively imported stylesheets.
It is also not obvious how libxslt can be fixed. That might require a complete rewrite of the strip-space implementation.
Note that it is inherently wrong for libxslt to modify the *input* document in place during an XSLT transformation. If you run the same transform twice, once with stripping whitespace and once without it, you would get the same result in both cases, even though you asked for something else. Here is another nice example:
Here is the valgrind output, using libxslt 1.1.32, libxml2 2.9.8 and CPython 3.7.0:
==20079== 1 errors in context 1 of 50: valgrind/ vgpreload_ memcheck- amd64-linux. so) 4lxml_5etree_ 9_Document_ __dealloc_ _ (etree.c:51785) 4lxml_5etree_ 9_Document_ 1__dealloc_ _ (etree.c:51765) dealloc_ 4lxml_5etree_ _Document (etree.c:224844) dealloc_ 4lxml_5etree_ _Element (etree.c:225159) dealloc_ 4lxml_5etree_ _ElementTree (etree.c:226052) valgrind/ vgpreload_ memcheck- amd64-linux. so) paces (transform.c:5732) heetInternal (transform.c:6011) 4lxml_5etree_ 4XSLT__ run_transform (etree.c:200006) 4lxml_5etree_ 4XSLT_18_ _call__ (etree.c:198792) 4lxml_5etree_ 4XSLT_19_ _call__ (etree.c:198352) FastCallKeyword s (call.c:199) valgrind/ vgpreload_ memcheck- amd64-linux. so) 4lxml_5etree_ 11_BaseParser_ _parseDocFromFi le (etree.c:122932)
==20079== Invalid free() / delete / delete[] / realloc()
==20079== at 0x4C30D3B: free (in /usr/lib/
==20079== by 0x6F377FE: xmlFreeNodeList (tree.c:3721)
==20079== by 0x6F3788C: xmlFreeNodeList (tree.c:3692)
==20079== by 0x6F3788C: xmlFreeNodeList (tree.c:3692)
==20079== by 0x6F37583: xmlFreeDoc (tree.c:1253)
==20079== by 0x6415957: __pyx_pf_
==20079== by 0x64157B0: __pyx_pw_
==20079== by 0x6744E65: __pyx_tp_
==20079== by 0x67457E8: __pyx_tp_
==20079== by 0x674667F: __pyx_tp_
==20079== by 0x1FE1CA: tupledealloc (tupleobject.c:246)
==20079== by 0x1688F4: call_function (ceval.c:4615)
[...]
==20079== Address 0x887dd00 is 0 bytes inside a block of size 120 free'd
==20079== at 0x4C30D3B: free (in /usr/lib/
==20079== by 0x6AB2744: xsltApplyStripS
==20079== by 0x6AB3733: xsltApplyStyles
==20079== by 0x66D4E1F: __pyx_f_
==20079== by 0x66CE938: __pyx_pf_
==20079== by 0x66CC3EF: __pyx_pw_
==20079== by 0x18999E: _PyObject_
==20079== by 0x16A617: call_function (ceval.c:4605)
[...]
==20079== Block was alloc'd at
==20079== at 0x4C2FB0F: malloc (in /usr/lib/
==20079== by 0x70154AE: xmlSAX2TextNode (SAX2.c:1863)
==20079== by 0x70181BC: xmlSAX2Characters (SAX2.c:2557)
==20079== by 0x6F1C1B3: xmlParseCharData (parser.c:4457)
==20079== by 0x6F29AB6: xmlParseContent (parser.c:9862)
==20079== by 0x6F2A492: xmlParseElement (parser.c:10014)
==20079== by 0x6F29B5A: xmlParseContent (parser.c:9846)
==20079== by 0x6F2A492: xmlParseElement (parser.c:10014)
==20079== by 0x6F2AC1A: xmlParseDocument (parser.c:10711)
==20079== by 0x6F323A0: xmlDoRead (parser.c:15191)
==20079== by 0x6F323A0: xmlCtxtReadFile (parser.c:15436)
==20079== by 0x6558BCB: __pyx_f_
[...]
It shows that libxslt frees text nodes in xsltApplyStripS paces() , which are then freed again by xmlFreeDoc() later. Meaning, somehow, they still reside in the document, although they have been freed. libxslt clearly corrupts the tree state here, which then leads to a crash when lxml discards the input document.
These nodes are created by the parser in libxml2, freed by the XSLT processor in libxslt, and then freed again by the document disposal in libxml2. All of this is outside of the control of lxml. Honestly, I cannot see what lxml could do to prevent this. It cannot even safely warn about XSLTs that strip whitespace, because that can even be triggered by transitively imported stylesheets.
It is also not obvious how libxslt can be fixed. That might require a complete rewrite of the strip-space implementation.
Note that it is inherently wrong for libxslt to modify the *input* document in place during an XSLT transformation. If you run the same transform twice, once with stripping whitespace and once without it, you would get the same result in both cases, even though you asked for something else. Here is another nice example:
----------------
from lxml import etree as et
transform = et.XSLT( et.fromstring( '''\ www.w3. org/1999/ XSL/Transform"> <xsl:value- of select="a/b/text()" /></foo> t>'''))
<xsl:stylesheet version="1.0"
xmlns:xsl="http://
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<foo>
</xsl:template>
</xsl:styleshee
xml = et.fromstring('''\
<a>
<b>huhu</b>
</a>
''')
print("BEFORE", et.tostring(xml, encoding= 'unicode' )) 'unicode' ))
print("XSLT", transform(xml))
print("AFTER", et.tostring(xml, encoding=
----------------
Output:
----------------
BEFORE <a>
<b>huhu</b>
</a>
XSLT <?xml version="1.0"?>
<foo>huhu</foo>
AFTER <a><b>huhu</b></a>
----------------