So a little of how I traced this down.
- Figured out what exactly the minidom parseString was doing when it created a parser (when none was provided). - This seemed to then go into the code @ http://svn.python.org/view/python/trunk/Lib/xml/dom/minidom.py?revision=75305&view=markup#l1917 - Note that it seems to jump into 2 different DOM impls depending on if a parser is provided or not so first I tried to see if I could monkey-patch out the parser it was 'creating' when no parser was selected, basically by trying to patch out the function @ http://svn.python.org/view/python/trunk/Lib/xml/dom/expatbuilder.py?revision=50941&view=markup#l932 - This is how I then noticed that http://svn.python.org/view/python/trunk/Lib/xml/dom/expatbuilder.py?revision=50941&view=markup#l155 is what is actually creating the underlying parser (so I was trying to then adjust settings in that underlying parser that would make it work like we expected). This is where I realized that self._parser.SetParamEntityParsing(expat.XML_PARAM_ENTITY_PARSING_NEVER) isn't actually doing anything, I didn't dive to much into the C code to figure out exactly why this call isn't actually changing anything but from initial dive I found http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?view=markup#l2215 which seems to be the entity expansion/reference code, note from that code there is logic around 'XML_ERROR_RECURSIVE_ENTITY_REF;' but this doesn't stop the case we are seeing that actually isn't recursive. This code then eventually calls http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?view=markup#l4665 which then starts the whole 'doContent()' function over again. - So then I was looking back at that C code @ line http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?view=markup#l2225 and was like it seems to be checking 'else if (defaultHandler)' and then stopping entity expansion right there if said handler actually exists, which I was like well thats odd. So then I started seeing about replacing this default handler (which apparently does not exist on said parsers unless set). This is how I then started looking at http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?revision=77680&view=markup#l1271 and seeing if I could just set any handler on this parser to stop it from doing what it was doing, so this is how I discovered that setting any default handler will cause 'defaultExpandInternalEntities = XML_TRUE;' to be called, which is then how i stumbled upon http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?view=markup#l2257 and this resulted in me messing with the default handler to see what I could set (anything actually) to turn off entity expansion.
End of chapter, josh vs the DTD beast.
So a little of how I traced this down.
- Figured out what exactly the minidom parseString was doing when it created a parser (when none was provided). svn.python. org/view/ python/ trunk/Lib/ xml/dom/ minidom. py?revision= 75305&view= markup# l1917 svn.python. org/view/ python/ trunk/Lib/ xml/dom/ expatbuilder. py?revision= 50941&view= markup# l932 svn.python. org/view/ python/ trunk/Lib/ xml/dom/ expatbuilder. py?revision= 50941&view= markup# l155 SetParamEntityP arsing( expat.XML_ PARAM_ENTITY_ PARSING_ NEVER) svn.python. org/view/ python/ trunk/Modules/ expat/xmlparse. c?view= markup# l2215 which seems to be the RECURSIVE_ ENTITY_ REF;' but this doesn't stop the svn.python. org/view/ python/ trunk/Modules/ expat/xmlparse. c?view= markup# l4665 svn.python. org/view/ python/ trunk/Modules/ expat/xmlparse. c?view= markup# l2225 svn.python. org/view/ python/ trunk/Modules/ expat/xmlparse. c?revision= 77680&view= markup# l1271 nternalEntities = XML_TRUE;' to be called, which is then how i stumbled upon svn.python. org/view/ python/ trunk/Modules/ expat/xmlparse. c?view= markup# l2257 and this resulted in me messing with the default handler to see what I could
- This seemed to then go into the code @ http://
- Note that it seems to jump into 2 different DOM impls depending on if a parser is provided or not so first I tried to see if I could
monkey-patch out the parser it was 'creating' when no parser was selected, basically by trying to patch out the function @
http://
- This is how I then noticed that http://
is what is actually creating the underlying parser (so I was trying to then adjust settings in that underlying parser that would
make it work like we expected). This is where I realized that self._parser.
isn't actually doing anything, I didn't dive to much into the C code to figure out exactly why this call isn't actually changing anything
but from initial dive I found http://
entity expansion/reference code, note from that code there is logic around 'XML_ERROR_
case we are seeing that actually isn't recursive. This code then eventually calls http://
which then starts the whole 'doContent()' function over again.
- So then I was looking back at that C code @ line http://
and was like it seems to be checking 'else if (defaultHandler)' and then stopping entity expansion right there if said handler actually
exists, which I was like well thats odd. So then I started seeing about replacing this default handler (which apparently does not exist
on said parsers unless set). This is how I then started looking at http://
and seeing if I could just set any handler on this parser to stop it from doing what it was doing, so this is how I discovered that setting any
default handler will cause 'defaultExpandI
http://
set (anything actually) to turn off entity expansion.
End of chapter, josh vs the DTD beast.