The recent cElementTree release is causing some waves in the Python/XML community. It started when Uche Ugbuji posted The Python Community has too many deceptive XML benchmarks to his blog. The effbot was not amused, as could be witnessed by his comment on it, and the blog entries: http://online.effbot.org/2005_01_01_archive.htm#sigh
http://online.effbot.org/2005_01_01_archive.htm#faking-it
http://online.effbot.org/2005_01_01_archive.htm#faking-it-2
http://online.effbot.org/2005_01_01_archive.htm#faking-it-3 The problem is that Uche unwittingly introduced a benchmark that is rather.. deceptive. He has been testing the time taken by the whole program, including startup and shutdown of the Python interpreter, module importing, and the like, instead of the part where XML processing takes place. Unless you're writing command line scripts or classic CGI web applications, Python startup time is hardly relevant, and shouldn't be part of the measurement. A while back while developing lxml.etree I was curious what benchmark Fredrik was using. I couldn't find the information on the web, but he told me when I mailed him about it. He was using the simple, obvious strategy which I myself had already been using: To measure approximate memory usage, he puts in a pause in the program before and after the processing, and checks the process overview on his machine manually. I've replicated his results with cElementTree and ElementTree fairly well, though my machine is a bit different in its performance characteristics due to platform differences. See other blog entries for more info on this. For fun, I thought I'd try Uche's benchmark against lxml.etree on this machine. I've also tested it against cElementTree (an older version, I can't keep up with Fredrik's releases; hm, no __version__ string I can find, so don't know what 0.9.x version it is.. reminds me to add one to lxml when the time comes for a release..). Here's Uche's program adjusted for etree. As you can see, only the import statement needs to change: I've also rewritten it to use xpath instead: Since this program is printing stuff, and printing overhead can be large, I've tried a number of tests: Here are the results: As you can see from the results, the type of terminal you're printing to matters a lot. In case of the xpath tests, almost half of the time is spent printing to the terminal, and for the other tests the overhead seems to be even more. Also note that at last I can claim a minor victory over cElementTree on my machine on this particular test! lxml.etree, when using xpath to do the task set, is faster than this version of cElementTree. Of course most of the credit here goes to libxml2's blazingly fast xpath implementation here. All this shows benchmarks are nice as there are so many to choose from.
(3) Mon Jan 24 2005 20:10 benchmarks and lxml:
.. imports ..
start = time.time() # time.clock() on windows
.. do the actual work ..
end = time.time()
print end - start
import lxml.etree as ElementTree
tree = ElementTree.parse("ot.xml")
for v in tree.findall("//v"):
text = v.text
if text.find(u'begat') != -1:
print text
from lxml.import etree as ElementTree
tree = ElementTree.parse("ot.xml")
for text in tree.xpath("//v[contains(., 'begat')]/text()"):
print text
A B C D
--------------------------
cElementTree 1.06s 0.32s 0.9s 0.23s
lxml.etree 1.2s 0.43s 1.1s 0.36s
lxml.etree xpath 0.53s 0.25s 0.42s 0.17s
- Comments:
Posted by Fredrik at Mon Jan 24 2005 20:27
If all he'd done was blogging it... but it has a prominent placement on onlamp.com's front page, ora.com's front page, and other XML sites have started to propagate the link.Btw, except for iterparse, the cET performance hasn't changed much throughout the 0.9 series. On the other hand, in cET 0.9.8, the iterparse-based version of this benchmark is 40% faster than uche's version. That would be, let's see, 0.14 seconds on your machine ;-)
Posted by Philippe at Mon Jan 24 2005 21:12
When benchmarking, also consider using the timeit python module. It's exactly what you're looking for IMO :-)
Posted by Fredrik at Mon Jan 24 2005 21:26
If timeit had worked for this kind of tests, we'd used it. But it doesn't work well on this kind of benchmark; it simply consume too much memory, and you end up depending on what the Python's memory allocator does when it runs out of memory, how efficient the underlying malloc implementation is, Python's GC characteristics, how the operating system allocator works, the system's swap performance, and lots of other things that has very little to do with the actual toolkit performance.(for example, minidom/cDomlette relies on Python's GC to clean things up; if you use timeit on it, the GC will kick in at irregular intervals, which can easily result in a 20-100% performance penalty).
