Wednesday, 22 Jul 2009
Wednesday, 22 Jul 2009
A new version of ODFDOM - the OpenDocument Java API - has been released!
ODFDOM is an Apache 2 licensed Java library to easily create, access and manipulate ODF documents.
If you never have heard of ODFDOM, you might want to get a quick overview first.
With this release we made an enormous step forward!
It took us several months of continuous refactoring until we finally agreed on shipping this new release. 'We' are in this case Benson Margulies, David Eisenberg and a group of IBM and Sun developers.
The 0.7 is more than worth to be called a release, embracing several stunning new features, the best listed below:
To ease the DOM handling, every element now has methods to create their element children, requiring mandatory information as parameters. With this users no longer have to look into the RelaxNG schema to verify if an element is allowed as a child.
Furthermore there are methods to access all possible attributes of the element, as well as enumerations for their possible values.
Aside of elements and attributes there are now as well classes for all data types listed in ODF 1.2 (mostly W3C schema types, which now bear validate and helper functions).
Finally, the convenient layer of ODFDOM shows examples how easy a final API might look like.
For instance it is a simple three liner to
OdfTextDocument odt = OdfTextDocument.newTextDocument();
odt.addText("My important text");
odt.save("MyExample.odt");
David Eisenberg has contributed new tutorials.
Our JavaDoc has been improved. Now every single ODFDOM element/attribute class links to an XHTML version of the spec, where the functionality of the XML node is being described in detail. This spec is bundled with the JavaDoc (soon the spec will be separated in smaller pieces to speed up loading times).
Last but not least, with the support by Benson Margulies, we were able to switch our build environment from ANT to Maven. By using Maven we made our dependencies declarative. We no longer add dependent JARs to our sources (e.g. the parser XercesImpl.jar), but are able to download them via Maven on demand. Following this idea of modularization, we were able to simplify the source structure of ODFDOM by moving our code-generation to an own new project called relaxng2template.
Still interesting challenges lie in front of us and we hope with this release we are able to arouse interests at like-minded developers to join our efforts!
I hope you all enjoy the new release!
Svante
tags: api java odf odfdom opendocument
Monday, 20 Jul 2009
Goal 1 : improve overall filter performance
Goal 2 : make use of multiple cores/cpu through threading (where threading is not blocked constantly by calls to the OOo core which is currently not supporting multiple threads without blocking)
Goal 3 : support the implementation of load on demand (for example to display the first slides to the user and load the rest of the document in the background)
XPP is a streaming pull XML parser. Unlike sax where the sax parser calls the filter, the filter itself makes calls to the XPP parser to parse the next xml element.
+ In contrast to sax, the filter can interrupt sax parsing after any element and continue parsing later.
+ Pull instead of Push leads to a cleaner filter implementation (cleaner code is usually easier to service and improve).
- Performance is equal to a sax parser
- No random access
A DOM is a parsed memory representation of a xml stream. This technology is used by modern browsers and the odftoolkit.org project.
+ Filter has random access to all xml elements
+ Random access leads to a filter implementation (cleaner code is usualy easier to service and improve).
- A DOM has to store a complete copy of the xml stream + management in memory during the filter process.
I developed the fast sax parser during the initial implementation of the Office 12 XML filters. It is basically a sax parser but uses integer tokens to represent known namespaces, element names and attribute names. Tools like the gnu gperf can be used to create perfect hash code to transform the xml names to integer tokens. Scripts can parse the dtd or relax ng of an xml format to automatically extract all the xml names that needs to be convertable to integers. With utf-8 xml streams, the tokens can be created without the need to transform the strings to another encoding first. Namespaces can be combined with xml names so each element or attribute name with a namespace can be identified with just one integer compare.
+ Reduces string handling (encoding, comparing, storing)
+ Leads to cleaner filter implementation (f.e. switch statements can be used to identify child elements instead of if .. else .. blocks which use the string compare functions)
+ Usage of perfect hash algorithms which are automatically created during compile time
- No random access
Since random access looks like the key to have a filter that supports painless load on demand, I decided to go with a DOM solutions. To minimize the memory footprint of the DOM tree, I used the fast sax parser to build the DOM tree and use the integer tokens for xml names instead of the strings from the stream. Now parsing the xml stream itself does not need any interaction with the application core so this could be done in a separate thread. Converting the xml representation of attribute values to an UNO representation is also something that could be done in almost all cases without the application core, so this should be done in the same thread.
The problem here is that a classic DOM is a generic and typeless representation of the xml data. The solution here is to use a technique we introduced in the odftoolkit.org project. Initially a xslt transformation was used to create dom node implementations for each element of the ODF format with type safe access to its attributes using
only the relax ng. Upkeeping xslt templates proved costly for such complex operations. So this was replaced with a code generator that I implemented in java which uses simple templates and configuration files to transform a relax ng schema to code files.
This code generator also allowed to create DOM tree element implementations for other languages than java which is used in the odftoolkit.org project. Therefore I used the generator to create c++ source files for all ODF elements and I adapted the configuration files to create types for the attributes that are equal to the UNO types that the filter needs to pass to the application core. (The key difference between the ODFDOM from odftoolkit.org and the DOM tree for the prototype is that the former uses ODF based types for the attributes and the later uses UNO types).
So in conclusion, a DOM tree builder is started in a worker thread and parses the xml stream by using the fast sax parser (The prototype actually starts two worker threads, one to build the tree for the styles.xml stream and one for the content.xml stream). It uses the sax events to create a tree where each element is the instance of a class that was specifically generated for that element from the relax ng schema. All attribute values are parsed and stored into an UNO Any with preferable the same type as used in the UNO API of the OOo application. So for example if the attribute is a length value then something like "12cm" is converted into an UNO Any containing an integer value of 1200 (12cm converted to 1/100th mm).
This worker threads would never block or wait for the OOo thread. But this would not make sense if at the same time the office thread idles and waits for the tree builder to finish. So the tree builder notifies the filter as soon as an imported element has been fully parsed. For example if the office:styles element is completely parsed the filter thread can start and import the styles. If the filter gets notified that a slide has been completely parsed, it can check if the needed styles and master pages are already imported and then import this slide. If the filter is also executed in a separate thread, the office thread can paint the already imported slides to enhance the responsiveness of the application to the user which results in a 'subjective' performance gain
To implement the filter I borrowed code from the existing ODF filter and transformed it to use the DOM tree instead of sax events. This mostly resulted in much less and cleaner code, as expected. For a reliable comparison with the existing filter I had to implement a minimum set of functionality so that for selected real world documents the prototype imports all the functionality available in that document.
I ended up implementing
First I used an average real world document with 47 slides and lots of graphics and some chart ole. It showed that the prototype filter was around 2% faster than the original filter. This was less than expected.
Next I created an artificial document. A presentation with 188 slides and only formated text, no graphics, no ole. This pure xml document would be uncommon for a presentation but a good approximation what the gain could be for a writer or calc document where the xml to graphic/ole ratio is much higher. This lead to a performance gain of 10% which is not bad since the prototype is not yet profiled and optimized itself.
I tested this on a dual core 3Ghz system. Experiments with the original filter showed that we are cpu bound and since we use only one core, the processor usage is only 50%. So I expected better results with the threaded prototype by making use of the 3Ghz from the second core. A quick look at a cpu monitor showed that this didn't happen, processor usage was still capped at 50%.
This made me suspect that the actual parsing, tree building and xml to UNO conversion did only account for an insignificant amount of the time the filter needs to import this document. After removing the threading I measured the time it took to parse the xml streams and to actually import the document. It turns out that building the DOM tree for the styles.xml and content.xml accounted for less than 2% of the overal time. So when opening a document that takes 10 seconds to load, the second core is only used 0,2 seconds. Remember that this does not only include the actual xml parsing but also creating the DOM tree in memory and converting most of the attribute values to UNO types.
The interesting finding here is that the overhead from the xml parsing is much less than the typical 'developer gut feeling' about xml. So any performance work in the filter or the application core would be much more efficient then trying to speed up the xml parsing.
Since the usage of DOM is often criticized for its memory consumption, I also took a look at this. While I replaced most strings with 32 bit integer tokens I figured that the memory consumption of a DOM tree should not be greater than the xml stream it was parsed from. My expectation was that for most cases it should be even smaller.
The first measuring showed that it was actually 10 times more than its xml stream. After further investigation I found the source of the problem which is the overhead for each allocated instance from the memory manager.After converting attributes from individual instances to a single vector this dropped to 2x the size of the xml stream. For an impress document this is not a problem as the xml stream size for average documents is seldom more then 1 MB. For a writer document this may also be no problem as for example the huge OpenDocument specification has less than 10MB of xml streams. For calc this may be a problem as calc document with many cells can have xml streams of 100MB and more. For current office workstations this may not be a problem at all, but if OOo runs on a server for multiple users or if OOo would be ported to small devices then this could become an issue.
While the prototype showed that this method of implementing an ODF import filter does result in a performance gain and the option to support load on demand, for an impress application the performance gain of only around 2% for real world documents is out weighted by the actual cost to implement this filter. I currently estimate an effort to at least 6 man month for implementation of only the impress import filter (this is without time for testing which is also crucial to find regressions). For a calc and writer filter these figures may differ.
Since writer and calc are more xml centric then impress documents, a prototype for these applications may show that the overall gain still does out weight the costs.
tags: api import impress odf openoffice.org performance uno xml
Sunday, 11 Jan 2009
For those who haven't been already informed by the plugin update mechanism of NetBeans. We have uploaded a new version 2.0.3 of our plugin in the update center of NetBeans 6.5. It's a micro release with minor bugfixes and better or final support for NetBeans 6.5.
The most interesting features are:
tags: api extension netbeans openoffice
Wednesday, 06 Aug 2008
We have released a new version of the OpenOffice.org API plugin for Netbeans. The new version 1.1.3 is mainly a bugfix release and we have tested it with NetBeans 6.1 and the developer snapshot of Netbeans 6.5. It supports now OpenOffice.org 3.0 and of course it is the first version that supports the new OpenOffice platform Mac OSX.
What's new:
Try it out and give us feedback.
tags: api netbeans openoffice.org
Thursday, 08 May 2008
tags: api architecture code development netbeans odf opendocument software specification sun xml
Friday, 25 Apr 2008
ODFDOM is the name of the upcoming free OpenDocument framework sponsored by Sun Microsystems Inc.
It will be the next evolutionary step after AODL and Odf4j. Designed together with their architects with the intent to provide an easy lightwork programming API for the ODF developer community. ODFDOM is meant to be portable to any object-oriented language.
The first pre-version of the Java 5 reference implementation of ODFDOM is planned to become available under LGPL3 in May 2008.
Please find further detailed information in the OOo Wiki.
tags: api architecture code development netbeans odf opendocument software specification sun xml
Monday, 18 Feb 2008
The OpenOffice.org Developer's Guide is now available online in the OpenOffice.org Wiki. The main purpose of moving the guide into the wiki is for maintenance reasons and the hope to get more contributions. We also hope to get a localized version of the guide to reach more users/developers all over the world.
But before anybody will start with localization i would suggest to wait for the official go becaue the guide requires some guidelines to keep the guide well formed in general and especially for later post processing (e.g. export of the whole wiki book/guide to PDF). The guidelines will also document some extensions like the IDL tags that are useful to link directly in the IDL reference and that are used to generate cross references from the IDL reference back into the DevGuide. But anyway you will get more infos soon and i am looking forward to a faster growing, an improved and always up to date guide.
Overview of the IDL tags
tags: api developer's guide
Monday, 11 Feb 2008
We still have one little issue that is no real error but it is definitely not nice. But the plugin works and is completely functional and because of many requests we decided to provide an intermediate version 1.1.1 on api.openoffice.org.
The problem is that the NetBeans editor complains about unknown types. These types are generated from IDL types and we generate Java class files directly because of backward compatibility reasons. The cool NetBeans editor feature (indeed nice) requires Java code files for all generated and used types in a project (background compilation etc.). Anyway you can build your projects and everything works. If you prefer to get rid of this annoying editor message you can add a further library dependency to your project.
<project_node> -> context menu -> Libraries -> Add Jar/Folder -> <project_dir>/dist/IDL_types.jar
We will try to fix this problem soon but we can't say when. So feel free to try the new version with NetBeans 6 and please report any kind of problems.
tags: api extensions netbeans openoffice.org plugin sdk
Thursday, 20 Dec 2007
To all NetBeans users and users of our OpenOffice.org API plugin,
currently our OpenOffice.org API plugin is not available via the normal NetBeans 6 Update Center. The reason is quite simple, we have to fix some minor problems first. We are not really happy with this situation and we will try to fix this as soon as possible. The problem is that we have limited time at the moment because of some other projects but anyway stay tuned we will support NetBeans 6 soon.
The key point for me behind all the questions when will the plugin be available for NB 6, will you support NB 6 etc. is that it is really used. This little fact is really motivating and it helps us to convince our managers that we can do more. You know if the demand is high we can probably spend more time on it ...
Anyway there are a lot of things that we can improve and we have a lot of other ideas how we can improve the plugin. Important is also that you as our users give us feedback. Share your ideas with us and we will see if and how we can achieve your ideas in the future development of the plugin. If you have experiences with NetBeans plugin development and if you are interested to join this project, please send me an email.
I wish you our plugin users, all OpenOffice.org users and developers and of course all GullFOSS readers a merry Christmas and a happy new year.
Juergen 
PS: the animated gifs are from http://www.fg-a.com/christmas.htm
Friday, 16 Nov 2007
Currently, several solver components for Calc are available (with different capabilities and different licenses), and OOo 3.0 is scheduled to contain another one in the default installation. Some areas, like non-linear models, are still open, so more solver implementations may appear in the future.
Instead of having a variety of “Solver xyz” entries in the “Tools” menu, each with their own dialog, it seems better to select between the implementations in a single dialog. So that's what we are doing: There's going to be a UNO service “com.sun.star.sheet.Solver” for the core implementation, without UI, and a dialog within the “sc” module to specify model parameters, and select between different service implementations. Future components will be able to implement just that service, and be called from the common dialog. The dialog design isn't final yet, but it will look similar to this (the images in the specification will be updated if the design is changed):
The “Options” button will open an options dialog where you can select an implementation and the options needed for that implementation.
The Solver service will be something like this, plus an XPropertySet for the options (note: even more preliminary than the dialog design):
enum SolverConstraintOperator
{
LESS_EQUAL,
GREATER_EQUAL,
EQUAL,
INTEGER,
BINARY
};
struct SolverConstraint
{
com::sun::star::table::CellAddress Left;
SolverConstraintOperator Operator;
any Right;
};
interface XSolver: com::sun::star::uno::XInterface
{
[attribute] XSpreadsheetDocument Document;
[attribute] com::sun::star::table::CellAddress Objective;
[attribute] sequence< com::sun::star::table::CellAddress > Variables;
[attribute] sequence< SolverConstraint > Constraints;
[attribute] boolean Maximize;
void solve();
[attribute, readonly] boolean Success;
[attribute, readonly] double ResultValue;
[attribute, readonly] sequence< double > Solution;
};
service Solver: XSolver;
|
As an added benefit, since the service implementation is available from outside the solver dialog, you will be able to easily use solver functionality from macros or other components.
Implementation will be in the child workspace “calcsolver”, and there's also a wiki page. Feedback is welcome on the sc-dev mailing list.
tags: api calc openoffice.org spreadsheet