GullFOSS
OpenOffice.org Engineering at Sun
 
 
 
 
More Flickr photos tagged with openoffice

Today's Page Hits: 1017

Locations of visitors to this page
Main | Next page »
Tuesday, 21 Jul 2009
Saving sheets separately: Preliminary results
Niklas Nebel

Some months ago, I did an experiment with creating XML elements only for changed cells. That approach has two drawbacks:

So now I tried a similar approach, based on sheets instead of cells. It consists of the following:

So far, I have done the "automatic styles" part only for cell styles. With this version, I did some performance measurements using the old George Ou example, which contains text, numbers and dates on 16 sheets. The time for saving depends on how many sheets were modified:

Time for Saving

The red line is the measured time in the CWS, based on milestone m49. Later milestones contain separate optimizations for saving (see here, here and here in the wiki), m52 is shown in yellow. These optimizations will still help for the sheets that aren't copied, so after rebasing the expected result should be something like the green line. Compared to m52, that's half the time if only one sheet was modified.

Handling of other style types (especially text and shape) is still missing, but shouldn't affect the time for simple-cell-content-only documents. Results for other types of documents will follow. Implementation is ongoing in CWS "calcsheetdata", and there's also a wiki page. Stay tuned.

tags:

Posted by Niklas Nebel on 21 Jul 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[3]

Monday, 20 Jul 2009
XML Performance, and now for something completely different...
Christian Lippka
While Armin Le Grand did a great job at improving the load/save Performance for presentation documents by tweaking the application core itself, I took a step back and thought about performance improvement by using different technologies. The first step was to look at other techniques to deal with xml documents that would have one or more advantages over the current implementation. Since according to Helmuth von Moltke "No battle plan survives contact with the enemy" I had to test my assumptions and so my theoretical work on this resulted in a  prototype import filter implementation for impress. This is a short  summary of what techniques I looked at, why and how I used them in the prototype and the interesting results I got from it.

Mission Statement

The mission of this prototype was to gather data how the utilization of new technology could enhance the performance of the native OpenDocument Format (ODF) filters for OpenOffice.org (OOo). The focus of this prototype was to first look at the import of impress documents and achieve the following three goals

Goal 1 : improve overall filter performance

Goal 2 : make use of multiple cores/cpu through threading (where threading is not blocked constantly by calls to the OOo core which is currently not supporting multiple threads without blocking)

Goal 3 : support the implementation of load on demand (for example to display the first slides to the user and load the rest of the document in the background)

Current state

Currently ODF is imported with a SAX based filter that is accessed over the UNO API. Therefore the current SAX parser pushes notifications about the xml elements to the current ODF filter implementation in the order they appear in the xml stream. The filter itself has no control on choosing which elements to parse first or to postpone the parsing of the current element for a later time. It makes it also nearly impossible to measure the time spend for the current xml parsing because this tight coupling between the SAX parser and the ODF filter.

XML streams from ODF documents are usually encoded as UTF-8. The UNO API uses strings in UTF-16 encoding. Therefore, the SAX parser converts all strings from the xml stream to UTF-16 which is used by the UNO string implementation. To identify xml element and attribute names, expensive string compares are conducted.

Between the ODF filter and the current OOo application core is another UNO API layer. The filter has to convert the xml events to something  that can be send over this layer to the OOo application core. In most cases the implementation of the API layer must also convert the given data to the format used in the applications core.

the detailed flow of data is currently as follows
  1. SAX parser reads xml data of utf-8 streams inside zip storages
  2. SAX parser UNO implementation handles namespaces and converts element names, attribute names, attribute values and text content to utf-16 strings and feeds the ODF filter implementation
  3. ODF filter implementation transforms utf-16 xml data to UNO representations for the OOo UNO API
  4. The applications UNO API implementation transforms the UNO data to a core data representation

Assumptions

Alternative technology

XML Pull Parser (XPP)

XPP is a streaming pull XML parser. Unlike sax where the sax parser calls the filter, the filter itself makes calls to the XPP parser to parse the next xml element.

+ In contrast to sax, the filter can interrupt sax parsing after any element and continue parsing later.
+ Pull instead of Push leads to a cleaner filter implementation (cleaner code is usually easier to service and improve).

- Performance is equal to a sax parser
- No random access

Document Object Model (DOM)

A DOM is a parsed memory representation of a xml stream. This technology is used by modern browsers and the odftoolkit.org project.

+ Filter has random access to all xml elements
+ Random access leads to a filter implementation (cleaner code is usualy easier to service and improve).

- A DOM has to store a complete copy of the xml stream + management in memory during the filter process.

Fast sax parser

I developed the fast sax parser during the initial implementation of the Office 12 XML filters. It is basically a sax parser but uses integer tokens to represent known namespaces, element names and attribute names. Tools like the gnu gperf can be used to create perfect hash code to transform the xml names to integer tokens. Scripts can parse the dtd or relax ng of an xml format to automatically extract all the xml names that needs to be convertable to integers. With utf-8 xml streams, the tokens can be created without the need to transform the strings to another encoding first. Namespaces can be combined with xml names so each element or attribute name with a namespace can be identified with just one integer compare.

+ Reduces string handling (encoding, comparing, storing)
+ Leads to cleaner filter implementation (f.e. switch statements can be used to identify child elements instead of if .. else .. blocks which use the string compare functions)
+ Usage of perfect hash algorithms which are automatically created during compile time

- No random access

The Prototype

Since random access looks like the key to have a filter that supports painless load on demand, I decided to go with a DOM solutions. To minimize the memory footprint of the DOM tree, I used the fast sax parser to build the DOM tree and use the integer tokens for xml names instead of the strings from the stream. Now parsing the xml  stream itself does not need any interaction with the application core so this could be done in a separate thread. Converting the xml representation of attribute values to an UNO  representation is also something that could be done in almost all cases without the application core, so this should be done in the same thread.

The problem here is that a classic DOM is a generic and typeless representation of the xml data. The solution here is to use a technique we introduced in the odftoolkit.org project. Initially a xslt transformation was used to create dom node implementations for each element of the ODF format with type safe access to its attributes using
only the relax ng. Upkeeping xslt templates proved costly for such complex operations. So this was replaced with a code generator that I implemented in java which uses simple templates and configuration files to transform a relax ng schema to code files.

This code generator also allowed to create DOM tree element implementations for other languages than java which is used in the  odftoolkit.org project. Therefore I used the generator to create c++ source files for all ODF elements and I adapted the configuration files to create types for the attributes that are equal to the UNO types that the filter needs to pass to the application core. (The key difference between the ODFDOM from odftoolkit.org and the DOM tree for the prototype is that the former uses ODF based types for the attributes and the later uses UNO types).

So in conclusion, a DOM tree builder is started in a worker thread and parses the xml stream by using the fast sax parser (The prototype actually starts two worker threads, one to build the tree for the styles.xml stream and one for the content.xml stream). It uses the sax events to create a tree where each element is the instance of a class that was specifically generated for that element from the relax ng schema. All attribute values are parsed and stored into an UNO Any with preferable the same type as used in the UNO API of the OOo application. So for example if the attribute is a length value then something like "12cm" is converted into an UNO Any containing an integer value of 1200 (12cm converted to 1/100th mm).

This worker threads would never block or wait for the OOo thread. But this would not make sense if at the same time the office thread idles and waits for the tree builder to finish. So the tree builder notifies the filter as soon as an imported element has been fully parsed. For example if the office:styles element is completely parsed the filter thread can start and import the styles. If the filter gets notified that a slide has been completely parsed, it can check if the needed styles and master pages are already imported and then import this slide. If the filter is also executed in a separate thread, the office thread can paint the already imported slides to enhance the responsiveness of the application to the user which results in a 'subjective' performance gain

To implement the filter I borrowed code from the existing ODF filter and transformed it to use the DOM tree instead of sax events. This mostly resulted in much less and cleaner code, as expected. For a reliable comparison with the existing filter I had to implement a minimum set of functionality so that for selected real world documents the prototype imports all the functionality available in that document.

I ended up implementing

Results

First I used an average real world document with 47 slides and lots of graphics and some chart ole. It showed that the prototype filter was around 2% faster than the original filter. This was less than expected.

Next I created an artificial document. A presentation with 188 slides and only formated text, no graphics, no ole. This pure xml document would be uncommon for a presentation but a good approximation what the gain could be for a writer or calc document where the xml to graphic/ole ratio is much higher. This lead to a performance gain of 10% which is not bad since the prototype is not yet profiled and optimized itself.

I tested this on a dual core 3Ghz system. Experiments with the original filter showed that we are cpu bound and since we use only one core, the processor usage is only 50%. So I expected better results with the threaded prototype by making use of the 3Ghz from the second core. A quick look at a cpu monitor showed that this didn't happen, processor usage was still capped at 50%.

This made me suspect that the actual parsing, tree building and xml to UNO conversion did only account for an insignificant amount of the time the filter needs to import this document. After removing the threading I measured the time it took to parse the xml streams and to actually import the document. It turns out that building the DOM tree for the styles.xml and content.xml accounted for less than 2% of the overal time. So when opening a document that takes 10 seconds to load, the second core is only used 0,2 seconds. Remember that this does not only include the actual xml parsing but also creating the DOM tree in memory and converting most of the attribute values to UNO types.

The interesting finding here is that the overhead from the xml parsing is much less than the typical 'developer gut feeling' about xml. So any performance work in the filter or the application core  would be much more efficient then trying to speed up the xml parsing.

Since the usage of DOM is often criticized for its memory consumption, I also took a look at this. While I replaced most strings with 32 bit integer tokens I figured that the memory consumption of a DOM tree should not be greater than the xml stream it was parsed from. My expectation was that for most cases it should be even smaller.

The first measuring showed that it was actually 10 times more than its xml stream. After further investigation I found the source of the problem which is the overhead for each allocated instance from the memory manager.After converting attributes from individual instances to a single vector this dropped to 2x the size of the xml stream. For an impress document this is not a problem as the xml stream size for average documents is seldom more then 1 MB. For a writer document this may also be no problem as for example the huge OpenDocument specification has less than 10MB of xml streams. For calc this may be a problem as calc document with many cells can have xml streams of 100MB and more. For current office  workstations this may not be a problem at all, but if OOo runs on a server for multiple users or if OOo would be ported to small devices then this could become an issue.

Conclusion

While the prototype showed that this method of implementing an ODF import filter does result in a performance gain and the option to support load on demand, for an impress application the performance gain of only around 2% for real world documents is out weighted by the actual cost to implement this filter. I currently estimate an effort to at least 6 man month for implementation of only the impress import filter (this is without time for testing which is also crucial to find regressions). For a calc and writer filter these figures may differ.

Since writer and calc are more xml centric then impress documents, a prototype for these applications may show that the overall gain still does out weight the costs.




tags:

Posted by Christian Lippka on 20 Jul 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[8]

Friday, 15 May 2009
Chart performance
Ingrid Halama

Recently I did some performance measurements on charts. I used a big line chart with 13 series each with 4000 data points. The whole cycle of editing was measured:

load the ods document (bright green)
enter the charts edit mode per double click (orange)
change the chart (red)
leave the edit mode (dark green)
and save the ods document (yellow).

The whole cycle needs about 4 minutes. Where does the time go?

Look at the chart on the right side, where some of the more expensive calls are identified. It turns out that there are some superfluous calls that consume away the users time without justification! They are marked with a fat black border.

The first is Issue 101925. A metafile replacement image is requested during edit mode. This is not necessary as the chart uses the SdrView for rendering while in edit mode.

The second problem identified is Issue 101928. Painting was actually performed twice instead of once - both while entering the edit mode and also after changing the chart.

Eliminating those calls did speed up a single chart change within edit mode from 1:50 min to 0:19 min, that is round about 5 to 6 times faster now with this big document! :-)

As OLE objects are rendered via a replacement image while not in edit mode now the expensive creation of the replacement image  needs to happen at the end of editing. So leaving the edit mode takes longer now. But in sum we have saved time. Even more if one takes into acount that usually more than one change is performed when editing a document. You will not need multiples of 1:50 min anymore but only multiples of 19 seconds instead.

superfluous calls

Look at the chart below how the times of the different steps compare between OOo 3.0, dev300m47 and a changed dev300m47 including experimental fixes for the above identified issues:

preformance comparison with OOo 3.0

tags:

Posted by Ingrid Halama on 15 May 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[3]

Friday, 08 May 2009
Handling sheets separately, part 1: Row Heights
Niklas Nebel

One of the possible approaches to improving Calc load performance is to separate the processing of individual sheets. The basic idea is that often a file with several sheets is loaded, but only some of the sheets are actually used for editing. A part of processing that can well be separated is the updating of automatic row heights, at least if no shapes have to be adjusted. With the implementation in CWS dr70, row height updates after loading are now limited to sheets with shapes, and the active sheet from the view settings. Other sheets are updated when the row heights are needed (activating the sheet, or printing). There is a progress bar in this case, so the user sees the reason for the delay.

screen shot

The resulting improvement (in time before the active sheet is shown and ready for editing) depends a lot on the nature of the file. The chart below shows relative CPU time for a single-sheet file, the old George Ou example, a file with only date cells on many sheets, and the file from issue 85178.

performance chart

There is also a wiki page on the subject. Possible next steps are:

tags:

Posted by Niklas Nebel on 08 May 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Wednesday, 29 Apr 2009
Going Faster with Less Fuel
Eike Rathke

Recently, as task of the OpenOffice.org Performance Project, I was working on refactoring Calc spreadsheet area broadcasters that are used to notify formula cells listening to ranges such as A1:A256 whenever a cell's value is changed in that range. The results are overwhelming, I didn't even expect such a big improvement.

When loading test case documents (admittedly heavily constructed for this purpose and rarely to be encountered in the wild) with lots of formulas referring different areas in a certain constellation, load times went down to 55%-63% of the original implementation, accompanied by memory consumption down to 75%. Recalculating a change that involves 65536 respectively 131072 different ranges formula cells are listening to went down to 49% respectively 29%. Additionally, time to close the larger document went down from ~10s to ~1s. Savings may be less in real life, because almost no document stresses the implementation as the test cases do, the improvement is there though.

The change is in CWS calcperf04, targeted to OOo3.2 and currently ready for QA.

For details, numbers and links to test case documents please visit the Refactoring Area Broadcasters wiki page.

tags:

Posted by Eike Rathke on 29 Apr 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[2]

Monday, 27 Apr 2009
writer load/save performance
Bjoern Michaelsen
Oliver Specht and I are currently working on improving the load/save performance of OpenOffice.org – especially for writer documents. How is that done?

Remembering what a great mind had to say about optimization:

“premature optimization is the root of all evil.”

– Donald Knuth, TAOCP

we concluded that the first thing to do is to find the pieces of code that are relevant to the performance of load/save operations. Thus we started with profiling these operations with a set of typical documents (the ODF specification, a science thesis, a mailmerged letter and a manual for an electronic device). We used Intel vTune on Windows and Callgrind/Cachegrind on Linux. The results were interesting and were yet again a confirmation of the old saying that the bottlenecks are not where you expect them to be.

The results were very different depending on document and platform. Some innocent looking operations became very relevant for the performance of the whole operation:

None of these issues appeared equally relevant in all documents – actually all those issues dominated the contribution to total document save time in only one test document each. Fixing them resulted in a major performance improvement – but only for the document which was hit by the performance issue (and others like that one), while it remained irrelevant for the other test documents.

Now that we (os, mav and me) have been able to weed out these special cases in cws os128, we might find the general performance issues that should be relevant for all documents, but the gains there are probably less drastic than the ones achieved by fixing these special cases for certain documents.

The general performance is likely only improvable by optimizing core data structures. The SfxItemSet/SfxItemPool classes are the prime contributors to the instructions fetched for SaveAsOwnFormat (more than 15% measured with Cachegrind). Thus, those are the places one will need to go to for universal speedups. I am currently investigating a major rewrite at SfxItemSet – depending on how much real life performance is gained by the reimplementation, it might be scrapped, integrated or extended to also look at SfxItemPool.

We have a wiki page describing the progress of the writer load/save performance analysis and improvements. If you are interested, you will find the callgrind/vTune generated profiles and the test documents there too.

tags:

Posted by Bjoern Michaelsen on 27 Apr 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Achievements for a better start up performance
Carsten Driesner

A couple of weeks ago I presented an abstract about the cold start performance analysis of OpenOffice.org. Today I want to give you an overview about the current state of our work. I am responsible for the start up improvements on Windows operating systems. Therefore you will see our current achievements for this platform only. The are people optimizing the start up performance on the other support platforms (Linux, Solaris and MacOS X). Please keep in mind that many improvements are also platform independent.

The cold start up performance analysis revealed that loading all necessary shared libraries makes up 52%. Loading data files and access to the file system makes up 28%. That means that about 80% of the whole cold start up time is used up by file I/O.

This result lead to the following questions:

  1. How can we make loading libraries faster?

  2. How can we make reading data files faster?

  3. How can we reduce the number of libraries, data files?

1. How can we make loading libraries faster?

Library structure

The first question must be addressed for every operating system separately. The main ideas and concepts are shared between operating systems but implementation details vary.

The first idea to load libraries faster was based on the fact that Windows loads needed parts of the library via on-demand paging. Means that the code/data from a library is loaded due to a page fault while accessing virtual memory. Based on the access schema of an application one could optimize the layout of the library to compact the part that is needed during start up. This would minimize the amount of page faults and read operations. A simple prototype which uses the access data of OpenOffice.org during start up to optimize the layout of libraries revealed, that up to 10% performance improvement is possible. It's interesting that the best cold start up performance improvement could be reached by not to rebasing any OpenOffice.org library (>20%). That forces Windows to rebase the library on-demand. First it sounds strange to not rebase any library although many documents including from Microsoft strongly recommend it. If you look more deeply into the loading process it's clear why it's the most efficient way. A library which needs to be rebased must be sequentially loaded into memory. This synchronous and sequential loading is the most efficient way to read a file into memory although it needs normally more memory. Unfortunately non-rebased libraries have several drawbacks therefore we are currently unsure to use it or not.

  • Non-rebased libraries are not shared between processes (this makes this feature unpleasant for server scenarios).

  • Windows writes a rebased library into the page file for on-demand paging as the original library file cannot be used.

  • Virtual memory can be fragmented as the libraries are now placed into memory by Windows.

The following chart shows all measured cold start up times. A default DEV300m40 developer snapshot was used as a base resulting in 100%.


You can find more details about the main idea, the implementation, results and links on the following wiki page: http://wiki.services.openoffice.org/wiki/Performance/OOo31_LibrariesOnStartup
If you have a deep knowledge about Microsoft Windows internals it would nice to know what other drawbacks are possible by using non-rebased libraries. So please let us know and give feedback.

Library placement

The next idea arose on the fact that the Process Monitor () log showed us that searching for a library can have a significant performance effect. Due to the new folder structure of OpenOffice.org 3.0 libraries are now spread over three different directories (base, brand and URE). Unfortunately Windows uses a searching schema which first tries to locate every library within folders where no OpenOffice.org library usually is installed. Therefore and the huge number of necessary libraries on start up gives us a measurable performance hit. In the end this makes up about 10% of the cold start up time.

You can find more details here: http://wiki.services.openoffice.org/wiki/Performance/Library_and_directory_structure

2. How can we make reading data files faster?

An important improvement has been made for the RDB files which contain UNO type and service information. The I/O time could be dramatically reduced using memory mapped files and synchronous loading of the data. You can find this improvement, which is available for all supported platforms, in DEV300m45 and later versions. Around 5% cold start up improvement can be seen for this change.

Looking at Process Monitor log files we detected that the language guessing feature is active during Writer start up. This is not necessary and degrades start up performance, especially the cold start up. Fixing this problem provides us 2% better cold start up time. This improvement will be part of one of the next DEV300 builds.

We are in the discussion phase about what and how the access to other data files can be optimized. Currently there are ideas to optimize the structure and access to images and configuration data. If you want to join us, provide your ideas or suggestions you use the following web page to join the discussion on the mailing list: http://performance.openoffice.org/servlets/ProjectMailingListList

3. How can we reduce the number of libraries, data files?

This question needs much detailed work until one can be sure that code/data is not necessary on start up. It needs code reviews, detailed knowledge of the OpenOffice.org start up and tests to be sure that no feature depends on a certain code part. Therefore this part won't provide any dramatic improvements in the near future. Here we need to find many small pieces to get measurable improvements.

You will find more performance updates in the near future. If you are interested in the described improvements you can download the latest DEV300 developer snapshot builds. Many more information about the performance project and all the work in progress can be found on the performance wiki page.
http://wiki.services.openoffice.org/wiki/Performance

tags:

Posted by Carsten Driesner on 27 Apr 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Friday, 03 Apr 2009
OpenOffice.org User Survey 2009: Performance Findings
Frank Loehmann

Today I post the performance finding from the OpenOffice.org User Survey 2009 (OOoUS2009). The OOoUS2009 can be accessed via the registration landing page of OOo linking to our LimeSurvey tooling.

Currently more than 64K users have started the survey and  more than 44K finally submitted their votes.

The survey has a performance part asking our users how satisfied they are with OOo's current performance. Performance is something that is perceived differently from person to person. It depends on the system environment used to run OOo, personal skills, the tasks that are performed with the software and external interferences like time pressure. Therefore we have also asked for the overall performance satisfaction with the computer system used by the user to have something that we can compare with OOo's findings.

We have asked our users to rate on the following performance relevant tasks using a 5 point scale from very bad [(-)(-)] to very good [(+)(+)]:

In general the overall satisfaction in terms of OOo's performance is good. More than 3/4 rated the following tasks positive (+) and very positive (+)(+):

Furthermore very few people (4%-13%) rated negative (-) and very negative (-)(-). Neutral (o) ratings are hard to rate,  but I think we could say that those users are not (really) satisfied with OOo's performance too. Otherwise they would have chosen a clear positive rating.

Compared to the overall system performance rating we can identify the following tasks that are rated significant worse:

  1.  Program start-up
  2.  Base
  3.  Math & Chart
  4.  Draw

Impress could be named as no. 5 but it is not really significant and 75% rated it good or very good.

Please see also the state of the Renaissance project presentation for March (performance part: 19ff).

For a deeper analysis, i.e. what tasks people did who voted negative on OOo's performance, requires additional tooling and some more time.

Feedback welcome!

Best regards,

Frank

An overview of  Project Renaissance presentations can be found at the OOo wiki.

tags:

Posted by Frank Loehmann on 03 Apr 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Tuesday, 10 Mar 2009
Performance #1 Database
Ocke Janssen
Moin,

As performance for a database application is really important we had a look at some parts how we fetch rows and if improvement is necessary. Yes, we got also some issues where people complain about and yes they are right.
I invested some time to find out why we are so slow. An I encounter that some code wasn't as fast as it could be. From the issues I extracted these test cases. Below you a short list:

For example, when looking at the csv files with more than 65k rows, we fetched each complete row which is very expensive when talking about 65k rows x 20 columns. But do we had to do this? No, each row has an exact position in the file which could be used as bookmark. The advantage of the bookmark approach is that we don't have to hold that 65k rows x 20 columns in memory which is much faster. I see you nodding ;-)
I created a nice report which shows the comparison of OOo 2.4 vs. OOo 3.0 vs. OOo 3.1 (to be) vs. cws dbaperf1 which includes my changes. The results which I achieved can be found here http://wiki.services.openoffice.org/wiki/Base/Performance#Test_Results
The charts show normalized values, the slowest one has 100% and lesser value implies faster office. And of course I used OOo Base to track the test data and the SRB to create the report which will be faster as well in the upcoming version for the OOo 3.1. ;-)

Things are now a little bit faster. Below you see the improvements for copying a table.

Copy a table in OpenOffice.org Base


Best regards,

Ocke


tags:

Posted by Ocke Janssen on 10 Mar 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[2]

Monday, 09 Mar 2009
Start up performance - something that always matters
Carsten Driesner

From a user perspective start up performance cannot be fast enough. If you are involved into the development of OpenOffice.org you definitely stumbled over many comments, tests and product reviews where this aspect played a certain role. A couple of weeks ago a new incubator project called performance was created which wants to concentrate on various aspects of OpenOffice.org performance (e.g. start up, loading/saving).

Today I want to show you what we have done so far for start up performance. We made a thoroughly analysis about the start up performance under Windows and Linux. Using powerful tools (e.g. Process Monitor from Sysinternals), which provide data from the system level, made it easy to collect data to quantify the different aspects during start up. Based on this data we want to see where we can influence the start up performance. The following summary concentrates on Windows.

The system used for these tests:

  • Notebook
  • 1,8 Ghz Pentium M
  • 1024 MB RAM
  • 60 GB 2,5“ Disk drive, 5400 rpm
  • Windows XP SP3 with the latest updates

This is not an up-to-date system (about 3 years old) but it has many technical details which are similiar to a new famous notebook class called NetBook.

The following table and chart summarises the outcome of the analysis.

Cold startup OpenOffice.org Writer

The following table summarises the measured values on OpenOffice.org Writer cold start up. Cold start means that OpenOffice.org was started after a reboot and removing the prefetch file which created by Windows XP. For more details about the prefetch feature of Windows XP you can links at the end of this blog.

Aspect

Time

Percentage

Writer startup

24800ms

100,00%

Load OpenOffice.org libraries (file I/O)

13012ms

52,47%

Load Windows libraries (file I/O)

242ms

0,97%

Data file/path/system file access (file I/O)

7045ms

28,41%

CPU time (according to Process Monitor)

2664ms

10,74%

Query status/Open/Close files/folders

305ms

1,23%

Registry access

42ms

0,17%

Write file access (file I/O)

5ms

0,02%

Currently unknown

1485ms

5,99%




It's obvious that file I/O plays the most important role on start up. More than 80% of the time needed for start up is lost due to reading libraries or data. Reading data files is about half the time OpenOffice.org needs to read all necessary libraries. It's also clear that raw CPU power doesn't help for a quicker start up.


Warm start up OpenOffice.org Writer

The next table summaries the values for a OpenOffice.org warm startup.

Aspect

Time

Percentage

Writer startup

3800ms

100,00%

Load OpenOffice.org libraries (file I/O)

286ms

7,53%

Load Windows libraries (file I/O)

19ms

0,50%

Data file/path/system file access (file I/O)

279ms

7,34%

Query status/Open/Close files/folders

24ms

0,63%

CPU time (according to Process Monitor)

2440ms

64,21%

Registry access

21ms

0,55%

Write acess (file I/O)

3ms

0,08%

Currently unknown

728ms

19,15%



The picture has completely changed. CPU time is now the most important part and file I/O plays only a minor role.

Summary

OpenOffice.org cold start up needs about 6,5x more time than the warm start up. Although warm start up performance could be better it's not that bad. Cold start up performance is definitely not acceptable and must be improved considerably. Therefore improving the cold start up scenario should be the topmost goal. As library loading is the biggest part of the cold start up time we have to concentrate on this first. We are currently discussing certain ideas to address this problem. You will find new information and what's going on in the next blog about start up performance.

You can find a very detailed analysis about the start up performance of OpenOffice.org on Windows here:
http://wiki.services.openoffice.org/wiki/Performance/OOo31_LibrariesOnStartup

If you want to join us to make OpenOffice.org faster you can mail us on the performance mailing list. If you have comments or want to provide data just use the GullFoss comments.

tags:

Posted by Carsten Driesner on 09 Mar 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[5]

Main | Next page » GullFOSS