GullFOSS
OpenOffice.org Engineering at Sun
 
 
 
 
More Flickr photos tagged with openoffice

Today's Page Hits: 3412

Locations of visitors to this page
« High entry barriers... | Main | Announcing Specifica... »
Friday, 05 Jan 2007
ODF is designed for the future, not the past.
Michael Brauer

Bob Sutor commented IBM's "no" vote for the ECMA Office Open XML file format a month ago in his blog as follows: "ODF is about the future, Open XML is about the past. We voted for the future." (see here for the full blog entry).

This comment reminded me of a similar statement I made myself a couple a times in the past: "ODF is designed for the future, not the past.".

What do I mean by this? This can be best explained by going back to the year 1999. At this time, OpenOffice.org was not yet existing, but I already worked on "StarOffice", that, as you know, was open-sourced in 2000 as "OpenOffice.org", and is still available as Sun's OpenOffice.org offering. StarOffice at this time had a binary file format like any other office suite, but we decided to create a new file format. Why? Because we wanted to make StarOffice, and later on OpenOffice.org, more interoperable, and we felt that just documenting the binary format would not be sufficient.

One design decision for the new format could be made fast. It should be based on XML. XML was new in 1999, but it was clear that it would emerge fast. But how to continue? How do you create a new office XML file format?

One option we had was to take StarOffice's binary formats as basis. The binary formats from the technical perspective worked well, so we could have just mapped the records and binary fields of these formats into XML. Microsoft seems to have designed Office Open XML this way 7 years later, so wouldn't this have been a good idea? Well, it would have been easy to implement and it would have improved interoperability compared to the situation we had with the binary formats. But the binary formats were close to the StarOffice implementation. The impact this would have had on the new file format can be seen in Office Open XML: It for instance contains implementation specific data structures, has different schemas for text documents, spreadsheets and presentations, and even uses different measurement units in the different application types. In WordProcessingML, tab stop positions are specified in twip (section 2.3.1.37 of Office Open XML Part 4 - Markup Language Reference), but in DrawingML they are specified in EMU (section 5.1.5.2.12).

By using StarOffice's binary formats as basis for our new XML file format, we actually would have gotten similar issues. And actually, it didn't take us long to come to the conclusion that we could achieve our interoperability goal much better if we ignore our binary formats and internal data structures, and instead, create a really new file format, based on existing standards. That means, we didn't just polish up our past (the binary formats), but made a hard break and analyzed what 's needed for office documents in the future. The file format we created was the OpenOffice.org XML file format, that evolved to the OASIS OpenDocument file format (ODF), and is now also ISO/IEC 26300.

What are the benefits of this decision for users? Let's assume we would have taken the binary formats and internal data structures as basis, as it seems to be the case for Office Open XML. Everyone who wants to work with our office documents on the file level then would have to deal with OpenOffice.org's internal data structures. Everyone would have to deal with the multiple implementations and therefore file formats we have for some objects for legacy reasons. In short, our users would have to deal with our, the implementors, legacy or past. And not just a single time, but over and over again.

By designing a new file format, we had some more work, but solved these legacy issues ourselves a single time. We did not load this burden onto our users. They can concentrate on designing their solutions, for their future, instead of dealing with our past.

And what are the consequences of our decision for us, the OpenOffice.org developers? We initially had some more work. But because ODF abstracts from OpenOffice.org's internal models we now can change these models without having to care about the file format. We can, for instance, replace implementations or merge implementations where we still have multiple ones due to legacy reasons. In short, keeping the OpenOffice.org code up to date has become much easier. Obviously, not only we ourselves benefit from this, but the OpenOffice.org users, too.

And what's with legacy documents? Did we get any problems with the legacy binary documents because of not taking the binary data/internal data structure approach? A clear no. Why should we? The feature set of OpenOffice.org did not change, so we can just load them and save them in the XML file format. A direct processing of binary files with XML implementations and vice versa is not possible anyway, so the ability to load (and maybe save) these binary documents is sufficient, and we had the code for this already!

tags:

Posted by Michael Brauer on 05 Jan 2007  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[2]

Comments:

I think it's important to have a standard for document formats to begin with. Starting with a format that creates a common ground to work from rather than relying on implementation-specific data structures and legacy formats makes a lot of sense to me. This is what I would have loved to have when I was in University. Often I would get documents from professors in one format or another, and if I had a more recent version of the application than they did, they could not even read my document. My question is, because this format creates a common ground for different applications to communicate, will it allow an older application to degrade it's rendering somewhat gracefuly of a file from a newer application? Will it be completely transparent altogether? I'm quite unfamiliar with this format as you can see... but it has piqued my interest. Cheers

Posted by Ean Bowman on January 06, 2007 at 09:32 PM CET #

Cart before the course is how we have been working on computers. The application being more important than the data. I am glad the world is starting to see it is all about the data. I saw that many years ago. But I hope we will see the freedom of information through the standardization of the file formats we store our data in.

Posted by Ron Knapper on January 07, 2007 at 11:26 PM CET #

Post a Comment:
Comments are closed for this entry.
« High entry barriers... | Main | Announcing Specifica... » GullFOSS