Open
Description
Documentation
From docs:
Because it’s so flexible, XMLPullParser can be inconvenient to use for simpler use-cases.
If you don’t mind your application blocking on reading XML data
but would still like to have incremental parsing capabilities, take a look at iterparse().
It can be useful when you’re reading a large XML document and don’t want to hold it wholly in memory.
The last sentence is wrong. iterparse
eventually loads the entire file to the memory, because iterparse
forms XML tree incrementally, i.e., the problem of huge memory consumption cannot be solved by using iterparse
without some extra code. If a person wants to process a large file with small memory cost, then they must at least repeatedly clean root
element from children (it depends on XML structure). Therefore, I suggest to remove the last sentence in docs.
The code below deletes a root child once it is completed, then processes and removes it from the memory (if nothing more references to it ofc). This allows to process 7GB XML with with a memory usage up to 10MB (in case of great number of root
children).
parser = XMLPullParser(['start', 'end']) # can be replaced with iterparse as well
root = None
with open(file) as f:
for line in f:
parser.feed(line)
for event, obj in parser.read_events():
match event:
case 'start':
if root is None: root = obj
case 'end':
if len(root) > 0 and obj == root[0]:
del root[0]
# process obj
parser.close()