Friday, 05 Jan 2007
Friday, 05 Jan 2007
Bob Sutor commented IBM's "no" vote for the ECMA Office Open XML file format a month ago in his blog as follows: "ODF is about the future, Open XML is about the past. We voted for the future." (see here for the full blog entry).
This comment reminded me of a similar statement I made myself a couple a times in the past: "ODF is designed for the future, not the past.".
What do I mean by this? This can be best explained by going back to the year 1999. At this time, OpenOffice.org was not yet existing, but I already worked on "StarOffice", that, as you know, was open-sourced in 2000 as "OpenOffice.org", and is still available as Sun's OpenOffice.org offering. StarOffice at this time had a binary file format like any other office suite, but we decided to create a new file format. Why? Because we wanted to make StarOffice, and later on OpenOffice.org, more interoperable, and we felt that just documenting the binary format would not be sufficient.
One design decision for the new format could be made fast. It should be based on XML. XML was new in 1999, but it was clear that it would emerge fast. But how to continue? How do you create a new office XML file format?
One option we had was to take StarOffice's binary formats as basis. The binary formats from the technical perspective worked well, so we could have just mapped the records and binary fields of these formats into XML. Microsoft seems to have designed Office Open XML this way 7 years later, so wouldn't this have been a good idea? Well, it would have been easy to implement and it would have improved interoperability compared to the situation we had with the binary formats. But the binary formats were close to the StarOffice implementation. The impact this would have had on the new file format can be seen in Office Open XML: It for instance contains implementation specific data structures, has different schemas for text documents, spreadsheets and presentations, and even uses different measurement units in the different application types. In WordProcessingML, tab stop positions are specified in twip (section 2.3.1.37 of Office Open XML Part 4 - Markup Language Reference), but in DrawingML they are specified in EMU (section 5.1.5.2.12).
By using StarOffice's binary formats as basis for our new XML file format, we actually would have gotten similar issues. And actually, it didn't take us long to come to the conclusion that we could achieve our interoperability goal much better if we ignore our binary formats and internal data structures, and instead, create a really new file format, based on existing standards. That means, we didn't just polish up our past (the binary formats), but made a hard break and analyzed what 's needed for office documents in the future. The file format we created was the OpenOffice.org XML file format, that evolved to the OASIS OpenDocument file format (ODF), and is now also ISO/IEC 26300.
What are the benefits of this decision for users? Let's assume we would have taken the binary formats and internal data structures as basis, as it seems to be the case for Office Open XML. Everyone who wants to work with our office documents on the file level then would have to deal with OpenOffice.org's internal data structures. Everyone would have to deal with the multiple implementations and therefore file formats we have for some objects for legacy reasons. In short, our users would have to deal with our, the implementors, legacy or past. And not just a single time, but over and over again.
By designing a new file format, we had some more work, but solved these legacy issues ourselves a single time. We did not load this burden onto our users. They can concentrate on designing their solutions, for their future, instead of dealing with our past.
And what are the consequences of our decision for us, the OpenOffice.org developers? We initially had some more work. But because ODF abstracts from OpenOffice.org's internal models we now can change these models without having to care about the file format. We can, for instance, replace implementations or merge implementations where we still have multiple ones due to legacy reasons. In short, keeping the OpenOffice.org code up to date has become much easier. Obviously, not only we ourselves benefit from this, but the OpenOffice.org users, too.
And what's with legacy documents? Did we get any problems with the legacy binary documents because of not taking the binary data/internal data structure approach? A clear no. Why should we? The feature set of OpenOffice.org did not change, so we can just load them and save them in the XML file format. A direct processing of binary files with XML implementations and vice versa is not possible anyway, so the ability to load (and maybe save) these binary documents is sufficient, and we had the code for this already!
Posted by Ean Bowman on January 06, 2007 at 09:32 PM CET #
Posted by Ron Knapper on January 07, 2007 at 11:26 PM CET #