Wednesday, 27 Sep 2006
Wednesday, 27 Sep 2006
Welcome to the readers of the GullFOSS blog. The Writer team wants to regularly inform you about current developing issues. My name is Frank Meies, I have been a member of the Writer team since 2001.
My first blog entry is about the performance improvements achieved by the introduction of automatic text and paragraph styles in the Writer core. This is the second major performance improvement besides the implementation of “word count during idle time”, which already gave us significant performance gains on storing large documents (see http://www.openoffice.org/issues/show_bug.cgi?id=64985).
What are “automatic styles”? Let's assume some of your text is formatted using the “bold” and “underline” attributes. For this formatting, you can find an automatic style in the content.xml file of your ODT file:
<style:style style:name="T1" style:family="text">
<style:text-properties style:text-underline-style="solid" fo:font-weight="bold"/>
</style:style>
with all bold, underlined text portions referring to the automatic text style T1.
So whereas the file format uses automatic styles, unfortunately the Writer core did not. Having two distinct text portions in your document formatted bold and underlined, each portion was associated with its own attribute set, both containing a “bold” and an “ underline” attribute. Therefore storing a document necessarily had to be performed in two passes: The first pass iterates over the text content in order to collect all applied automatic text/paragraph styles, the second pass exports the text content, establishing the link between attributed text/paragraphs and the appropriate automatic style.
By changing the Writer core the way that two text portions (or two paragraphs), which actually have the same attributes, already share the same attribute set, the collection of automatic styles during storing the document becomes obsolete, resulting in a massive performance improvement, especially for large, heavily attributed documents. The usage of automatic styles in the Writer core also has a positive effect while loading a document: Instead of setting e.g. two attributes “bold” and “underline” for a text portion, only the automatic style containing these two attributes has to be set.
Here are some results of my performance measurements. We compare the current OOo 2.0.4 to cws swautomatic01 (based on OOo 2.0.4), which implements the usage of automatic styles in the Writer core. Note that the results heavily depend on the document content:
| Loading: | Storing: |
|---|---|---|
1000 paragraphs, many character attributes | 32.9 % | 47.0 % |
1000 paragraphs, many paragraph attributes | 16.3 % | 24.9 % |
1000 paragraphs, no attributes | 0.0 % | 18.5 % |
OpenDocument specification | 7.1 % | 25.3 % |
That's all for today. Stay tuned for more interesting news from the Writer team.
tags:
Posted by Andrew Z on September 27, 2006 at 09:08 PM CEST #
the cws swautomatic01 is scheduled for OOo 2.1, see http://www.openoffice.org/issues/show_bug.cgi?id=65476
Regards, Frank
Posted by Frank Meies on September 28, 2006 at 08:35 AM CEST #