GullFOSS
OpenOffice.org Engineering at Sun
 
 
 
 
More Flickr photos tagged with openoffice

Today's Page Hits: 3197

Locations of visitors to this page
« New: OOo-DEV 3.x... | Main | Prototype of New UI:... »
Monday, 20 Jul 2009
XML Performance, and now for something completely different...
Christian Lippka
While Armin Le Grand did a great job at improving the load/save Performance for presentation documents by tweaking the application core itself, I took a step back and thought about performance improvement by using different technologies. The first step was to look at other techniques to deal with xml documents that would have one or more advantages over the current implementation. Since according to Helmuth von Moltke "No battle plan survives contact with the enemy" I had to test my assumptions and so my theoretical work on this resulted in a  prototype import filter implementation for impress. This is a short  summary of what techniques I looked at, why and how I used them in the prototype and the interesting results I got from it.

Mission Statement

The mission of this prototype was to gather data how the utilization of new technology could enhance the performance of the native OpenDocument Format (ODF) filters for OpenOffice.org (OOo). The focus of this prototype was to first look at the import of impress documents and achieve the following three goals

Goal 1 : improve overall filter performance

Goal 2 : make use of multiple cores/cpu through threading (where threading is not blocked constantly by calls to the OOo core which is currently not supporting multiple threads without blocking)

Goal 3 : support the implementation of load on demand (for example to display the first slides to the user and load the rest of the document in the background)

Current state

Currently ODF is imported with a SAX based filter that is accessed over the UNO API. Therefore the current SAX parser pushes notifications about the xml elements to the current ODF filter implementation in the order they appear in the xml stream. The filter itself has no control on choosing which elements to parse first or to postpone the parsing of the current element for a later time. It makes it also nearly impossible to measure the time spend for the current xml parsing because this tight coupling between the SAX parser and the ODF filter.

XML streams from ODF documents are usually encoded as UTF-8. The UNO API uses strings in UTF-16 encoding. Therefore, the SAX parser converts all strings from the xml stream to UTF-16 which is used by the UNO string implementation. To identify xml element and attribute names, expensive string compares are conducted.

Between the ODF filter and the current OOo application core is another UNO API layer. The filter has to convert the xml events to something  that can be send over this layer to the OOo application core. In most cases the implementation of the API layer must also convert the given data to the format used in the applications core.

the detailed flow of data is currently as follows
  1. SAX parser reads xml data of utf-8 streams inside zip storages
  2. SAX parser UNO implementation handles namespaces and converts element names, attribute names, attribute values and text content to utf-16 strings and feeds the ODF filter implementation
  3. ODF filter implementation transforms utf-16 xml data to UNO representations for the OOo UNO API
  4. The applications UNO API implementation transforms the UNO data to a core data representation

Assumptions

Alternative technology

XML Pull Parser (XPP)

XPP is a streaming pull XML parser. Unlike sax where the sax parser calls the filter, the filter itself makes calls to the XPP parser to parse the next xml element.

+ In contrast to sax, the filter can interrupt sax parsing after any element and continue parsing later.
+ Pull instead of Push leads to a cleaner filter implementation (cleaner code is usually easier to service and improve).

- Performance is equal to a sax parser
- No random access

Document Object Model (DOM)

A DOM is a parsed memory representation of a xml stream. This technology is used by modern browsers and the odftoolkit.org project.

+ Filter has random access to all xml elements
+ Random access leads to a filter implementation (cleaner code is usualy easier to service and improve).

- A DOM has to store a complete copy of the xml stream + management in memory during the filter process.

Fast sax parser

I developed the fast sax parser during the initial implementation of the Office 12 XML filters. It is basically a sax parser but uses integer tokens to represent known namespaces, element names and attribute names. Tools like the gnu gperf can be used to create perfect hash code to transform the xml names to integer tokens. Scripts can parse the dtd or relax ng of an xml format to automatically extract all the xml names that needs to be convertable to integers. With utf-8 xml streams, the tokens can be created without the need to transform the strings to another encoding first. Namespaces can be combined with xml names so each element or attribute name with a namespace can be identified with just one integer compare.

+ Reduces string handling (encoding, comparing, storing)
+ Leads to cleaner filter implementation (f.e. switch statements can be used to identify child elements instead of if .. else .. blocks which use the string compare functions)
+ Usage of perfect hash algorithms which are automatically created during compile time

- No random access

The Prototype

Since random access looks like the key to have a filter that supports painless load on demand, I decided to go with a DOM solutions. To minimize the memory footprint of the DOM tree, I used the fast sax parser to build the DOM tree and use the integer tokens for xml names instead of the strings from the stream. Now parsing the xml  stream itself does not need any interaction with the application core so this could be done in a separate thread. Converting the xml representation of attribute values to an UNO  representation is also something that could be done in almost all cases without the application core, so this should be done in the same thread.

The problem here is that a classic DOM is a generic and typeless representation of the xml data. The solution here is to use a technique we introduced in the odftoolkit.org project. Initially a xslt transformation was used to create dom node implementations for each element of the ODF format with type safe access to its attributes using
only the relax ng. Upkeeping xslt templates proved costly for such complex operations. So this was replaced with a code generator that I implemented in java which uses simple templates and configuration files to transform a relax ng schema to code files.

This code generator also allowed to create DOM tree element implementations for other languages than java which is used in the  odftoolkit.org project. Therefore I used the generator to create c++ source files for all ODF elements and I adapted the configuration files to create types for the attributes that are equal to the UNO types that the filter needs to pass to the application core. (The key difference between the ODFDOM from odftoolkit.org and the DOM tree for the prototype is that the former uses ODF based types for the attributes and the later uses UNO types).

So in conclusion, a DOM tree builder is started in a worker thread and parses the xml stream by using the fast sax parser (The prototype actually starts two worker threads, one to build the tree for the styles.xml stream and one for the content.xml stream). It uses the sax events to create a tree where each element is the instance of a class that was specifically generated for that element from the relax ng schema. All attribute values are parsed and stored into an UNO Any with preferable the same type as used in the UNO API of the OOo application. So for example if the attribute is a length value then something like "12cm" is converted into an UNO Any containing an integer value of 1200 (12cm converted to 1/100th mm).

This worker threads would never block or wait for the OOo thread. But this would not make sense if at the same time the office thread idles and waits for the tree builder to finish. So the tree builder notifies the filter as soon as an imported element has been fully parsed. For example if the office:styles element is completely parsed the filter thread can start and import the styles. If the filter gets notified that a slide has been completely parsed, it can check if the needed styles and master pages are already imported and then import this slide. If the filter is also executed in a separate thread, the office thread can paint the already imported slides to enhance the responsiveness of the application to the user which results in a 'subjective' performance gain

To implement the filter I borrowed code from the existing ODF filter and transformed it to use the DOM tree instead of sax events. This mostly resulted in much less and cleaner code, as expected. For a reliable comparison with the existing filter I had to implement a minimum set of functionality so that for selected real world documents the prototype imports all the functionality available in that document.

I ended up implementing

Results

First I used an average real world document with 47 slides and lots of graphics and some chart ole. It showed that the prototype filter was around 2% faster than the original filter. This was less than expected.

Next I created an artificial document. A presentation with 188 slides and only formated text, no graphics, no ole. This pure xml document would be uncommon for a presentation but a good approximation what the gain could be for a writer or calc document where the xml to graphic/ole ratio is much higher. This lead to a performance gain of 10% which is not bad since the prototype is not yet profiled and optimized itself.

I tested this on a dual core 3Ghz system. Experiments with the original filter showed that we are cpu bound and since we use only one core, the processor usage is only 50%. So I expected better results with the threaded prototype by making use of the 3Ghz from the second core. A quick look at a cpu monitor showed that this didn't happen, processor usage was still capped at 50%.

This made me suspect that the actual parsing, tree building and xml to UNO conversion did only account for an insignificant amount of the time the filter needs to import this document. After removing the threading I measured the time it took to parse the xml streams and to actually import the document. It turns out that building the DOM tree for the styles.xml and content.xml accounted for less than 2% of the overal time. So when opening a document that takes 10 seconds to load, the second core is only used 0,2 seconds. Remember that this does not only include the actual xml parsing but also creating the DOM tree in memory and converting most of the attribute values to UNO types.

The interesting finding here is that the overhead from the xml parsing is much less than the typical 'developer gut feeling' about xml. So any performance work in the filter or the application core  would be much more efficient then trying to speed up the xml parsing.

Since the usage of DOM is often criticized for its memory consumption, I also took a look at this. While I replaced most strings with 32 bit integer tokens I figured that the memory consumption of a DOM tree should not be greater than the xml stream it was parsed from. My expectation was that for most cases it should be even smaller.

The first measuring showed that it was actually 10 times more than its xml stream. After further investigation I found the source of the problem which is the overhead for each allocated instance from the memory manager.After converting attributes from individual instances to a single vector this dropped to 2x the size of the xml stream. For an impress document this is not a problem as the xml stream size for average documents is seldom more then 1 MB. For a writer document this may also be no problem as for example the huge OpenDocument specification has less than 10MB of xml streams. For calc this may be a problem as calc document with many cells can have xml streams of 100MB and more. For current office  workstations this may not be a problem at all, but if OOo runs on a server for multiple users or if OOo would be ported to small devices then this could become an issue.

Conclusion

While the prototype showed that this method of implementing an ODF import filter does result in a performance gain and the option to support load on demand, for an impress application the performance gain of only around 2% for real world documents is out weighted by the actual cost to implement this filter. I currently estimate an effort to at least 6 man month for implementation of only the impress import filter (this is without time for testing which is also crucial to find regressions). For a calc and writer filter these figures may differ.

Since writer and calc are more xml centric then impress documents, a prototype for these applications may show that the overall gain still does out weight the costs.




tags:

Posted by Christian Lippka on 20 Jul 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[8]

Comments

Lars Oppermann said:

Great research and nice writeup. Having once been involved in OOo XML myself I found this very interesting.

My understanding is that your prototype filter used the impress UNO API in order to push the document from the DOM represenation into the application core. Given that this is correct, would it be wothwhile to experiment with a filter that links directly to the application core in order to speed up load times? Would it be better to try and provide more eficient UNO APIs that better facilitate import (and export) of entire documents?

Cheers!
/lars

Posted by Lars Oppermann on July 20, 2009 at 04:56 PM CEST #

Mathias Bauer said:

Sure, better APIs always can improve load times. But that does not mean that linking to the core is necessary. The filter must convert between the ODF model and the core model. The code in between should be so close to one of those models that passing data through it does not have a considerable impact on speed. Whether the conversion between the models happens in the code parsing the XML (means: linking this code to the core) or behind an "ODF friendly" application API (that links to the core) is not relevant. Having an API close to the ODF view on the data that converts internally has the additional benefit that the core implementation can be changed later on to become more ODF like (and so dropping the conversion) without changing the filter code again. We used this strategy in the ODF filters already (by e.g. implementing autostyle support in the Writer core).

It should be considered that not in all cases a change of ODF like APIs has a real big impact on performance. Here the effort to change that is not justified.

Posted by Mathias Bauer on July 20, 2009 at 05:08 PM CEST #

Christian Lippka said:

Lars,

as Mathias already pointed out, one strategy to optimize this is to make the documents API more ODF like.

This could be followed to the extreme by linking the filter to the actual core. But this would lead to a situation that we already had with the binary filter where each core instance streamed itself directly to the file. So after a while the core itself could not be changed anymore since this would corrupt the binary format. This problem was solved by making a copy of the old code and born was the (infamous) binfilter module. We should not make this mistake again so an abstraction layer between the application core and the file format is actually a good idea. How thin this layer is, that is a tradeof between performance and flexibility...

Posted by Christian Lippka on July 20, 2009 at 05:16 PM CEST #

Gus said:

Truly surprising, I've had a bunch of bad experiences with XML parsing (DOM performance, SAX maintenance), so this is great news.

I remember writing a java plugin for ODT and UNO calls were VERY expensive. The first poster might be unto something... maybe keeping the UNO API but removing the corba-like infrastructure? Or maybe creating a an alternative thin API that is straight C++ going with the same functionality as UNO but straight into the core?

Posted by Gus on July 20, 2009 at 05:19 PM CEST #

Christian Lippka said:

Hi Gus,

if you are a c++ filter than UNO API is more or less straight virtual c++ calls. There is overhead that the query interface method and the ref counting is thread safe (not a bad thing at all). The amounts of ref counting and query interface could be reduced by multiple inheritance interfaces which are part of UNO for some years now.

The other problem is that the API for the applications was designed to be very generic ("One API for all office applications in the universe!") and not optimized for the serial kind of access you have with a filter. I say this often and I love to say it here again, UNO is not slow, the current UNO API implementation of the OOo applications might be.

Posted by Christian Lippka on July 20, 2009 at 05:38 PM CEST #

Lars Oppermann said:

@Gus: in-process C++ UNO calls are very efficient (single virtual function call IIRC). Hence, the infrastructure is not likely to be the problem. Given what Mathias and Christian explained, much more potential seems to lie in how the abstraction between core and API is actually implemented.

Maybe a dedicated "DocumentBuilder" API would have some merit... it might even facilitate some "delayed parsing" where one could just add a chunk of XML at some point (e.g. a slide object) that can be processed later instead of actually creating all the child objects right away

Posted by Lars Oppermann on July 20, 2009 at 05:48 PM CEST #

Mathias Bauer said:

Gus, of course API calls are expensive if done from Java as the calls have to be "translated" through a remote bridge, bringing down performance to the level of inter process calls.
The good thing with UNO is that it combines the versality of Corba and other middleware with in-process effectivity by treating UNO interfaces like "normal" C++ interfaces if they are done in the same process.

Posted by Mathias Bauer on July 20, 2009 at 10:11 PM CEST #

harry xu said:

VTD-XML may be a new option for solving XML performance issues

http://www.developer.com/java/other/article.php/3714051

Posted by harry xu on August 17, 2009 at 10:46 PM CEST #

Post a Comment:
Comments are closed for this entry.
« New: OOo-DEV 3.x... | Main | Prototype of New UI:... » GullFOSS