LXML still suggests a parser pool for threaded applications.
http://lxml.de/element_classes.html
To avoid interfering with other modules, however, it is usually a better idea to use a dedicated parser for each module (or a parser pool when using threads) and then register the required lookup scheme only for this parser.
Here is some example code from our code at work. We are using a custom Element class and thread local storage for parser instances.
if __name__ == "__main__":
tls = ParserTLS()
tree = etree.parse("test.xml", parser=tls.parser)
print tree.getroot().text
print list(tree.getroot().iterchildren())
@Thierry:
I'll ask on the PSRT list. My patch for expat won't be ready until Wednesday but we can release the restricted expat parsers classes for etree, sax and minidom as hotfixes. I'm waiting for some code review now. I also need to get back to the libxml2 guys ASAP.
LXML still suggests a parser pool for threaded applications.
http:// lxml.de/ element_ classes. html
To avoid interfering with other modules, however, it is usually a better idea to use a dedicated parser for each module (or a parser pool when using threads) and then register the required lookup scheme only for this parser.
Here is some example code from our code at work. We are using a custom Element class and thread local storage for parser instances.
import threading
from lxml import etree
class RestrictedEleme nt(etree. ElementBase) : ProcessingInstr uction, etree._Comment)
__slots__ = ()
# blacklist = (etree._Element, etree._
blacklist = etree._Element
def __iter__(self): dElement, self).__iter__():
continue
blacklist = self.blacklist
for child in super(Restricte
if isinstance(child, blacklist):
yield child
def iterchildren(self, tag=None, reversed=False): dElement, self).iterchild ren(tag= tag,
reversed= reversed)
continue
blacklist = self.blacklist
children = super(Restricte
for child in children:
if isinstance(child, blacklist):
yield child
# you may need to overwrite getchildren, find, findall and more if you use them
class ParserTLS( threading. local):
'resolve_ entities' : False,
'remove_ comments' : True,
'remove_ pis': True,
parser_cfg = {
}
@property (**self. parser_ cfg) faultClassLooku p(element= RestrictedEleme nt)
parser. set_element_ class_lookup( lookup)
self. _parser = parser
def parser(self):
parser = getattr(self, "_parser", None)
if parser is None:
parser = etree.XMLParser
lookup = etree.ElementDe
return parser
if __name__ == "__main__": "test.xml" , parser=tls.parser) getroot( ).iterchildren( ))
tls = ParserTLS()
tree = etree.parse(
print tree.getroot().text
print list(tree.
@Thierry:
I'll ask on the PSRT list. My patch for expat won't be ready until Wednesday but we can release the restricted expat parsers classes for etree, sax and minidom as hotfixes. I'm waiting for some code review now. I also need to get back to the libxml2 guys ASAP.