OpenOffice.org Engineering Lars Oppermann's Weblog

Dienstag Okt 30, 2007

After seven exciting years at Sun I have decided to move on and November 15th will be my last day at Sun. As much as I'm looking forward to my new challenge, I will be missing you folks a lot. I want to thank everyone at Sun for making this company such a cool place to work at over the past years. If you want to stay in contact, please do so at the usual places (i.e. Xing or LinkedIn).

Dienstag Sep 26, 2006

I his recent blog entry Michael Brauer provides some insight into the possibilities of a toolkit for the OpenDocuemnt format used by OpenOffice.org and other open-source and commercial office productivity applications.

Michael proposes the use of a language agnostic approach on the specification of such an API, a direction that I strongly second.

I would however caution the use of an interface description technique which compiles interfaces to the target environment accoriding to a fixed language binding. Just look at what happened to XML DOM, which was also specified with an IDL. It is a pain to use, since it cannot leverage features of the target language in an efficient manner (e.g. collections).

ODF describes a far more complex data model than the plain XML infoset addressed by DOM. So while I am all for specifying the model and the operations that are to be performed on it in a language agnostic way, I caution to use an IDL technique as the tool to do that. The main model should be more abstract. Language bindings should be created on the basis of that abstract model. It is than very well possible to create an IDL binding for that model. This can be used to generate bindings for languages that have an IDL binding but no specific binding for the abstract ODF model.

The whole thing could be viewed like this...

+------------------+---------------+
|  Language A      | Language B    |  
|                  +---------------+
+------------------| IDL Language B|
|                  | Binding       |
|Specific Language +---------------+
|Binding           | IDL Binding   |
|                  |               |
+------------------+---------------+
|        Abstract ODF Model        |
+----------------------------------+
|         ODF XML Schema           |
+----------------------------------+

I recently had a debate with on the open-docuemnt mailing list about the handling of whitespace. Dave Pawson blogged a few things about our exchange on which I would like to comment. Maybe I can also write down a few more useful bits about WS handling in ODF. Dave writes:

The proposition seems to be (for ODF 1.1) that applications (XML applications ala xml 1.1 definitions) can screw with an authors whitespace to their hearts content. If it's at the start of an element, dispose of it. If it's mid element replace it with markup. To me it seems weird. To implementors (IBM Sun and so on) it apparently seems the thing to do... the only rationale I've heard is that it makes XML presenation easier.

Now, this is unforunatly not entirly true. An ODF application should by no means screw around with an authors white space. White space in an ODF document just isn't always represented by literal whitespace in the XML representation of the ODF document.

It is useful to think of literal whitespace in the XML as being a word delimiter in ODF text content (e.g. in a paragraph). A sequence of blanks is represented by markup such as <text:s text:c="10"/>, which would represent 10 blanks in the actual document. This is described in the ODF specification (p. 35, p. 85).

The rationale behind this is, that the literal whitespace in the physical XML representation of the ODF document can be used to format (pretty-print) the markup, without changing the actual ODF document represented by the markup. Something which personally I find very practical. An ODF implementation is in no way encuraged to "screw around" with spaces represented in that way.