Python running out of memory parsing XML using cElementTree.iterparse -
a simplified version of xml parsing function here:
import xml.etree.celementtree et def analyze(xml): = et.iterparse(file(xml)) count = 0 (ev, el) in it: count += 1 print('count: {0}'.format(count))
this causes python run out of memory, doesn't make whole lot of sense. thing storing count, integer. why doing this:
see sudden drop in memory , cpu usage @ end? that's python crashing spectacularly. @ least gives me memoryerror
(depending on else doing in loop, gives me more random errors, indexerror
) , stack trace instead of segfault. why crashing?
the documentation tell "parses xml section into element tree [my emphasis] incrementally" doesn't cover how avoid retaining uninteresting elements (which may of them). covered this article effbot.
i recommend using .iterparse()
should read this article liza daly. covers both lxml
, [c]elementtree.
previous coverage on so:
using python iterparse large xml files
can python xml elementtree parse large xml file?
what fastest way parse large xml docs in python?
Comments
Post a Comment