GullFOSS
OpenOffice.org Engineering at Sun
 
 
 
 
More Flickr photos tagged with openoffice

Today's Page Hits: 1353

Locations of visitors to this page
« Migrating OpenOffice... | Main | New: OOo-Dev 2.3... »
Wednesday, 15 Aug 2007
Office Open XML (OOXML) Filters
Michael Brauer
If you read the weekly schedule for Sun's OpenOffice.org development team that Dieter Loeschky is blogging at GullFOSS every week, then you may have noticed already that it includes OOXML related tasks for months now. OOXML? Isn't that Office Open XML? That file format, that Microsoft Office 2007 is using, that has been approved by ECMA as ECMA-376, and that is currently in a fast track process to become an ISO standard like ODF? That file format, that although its name is very similar to Open Office XML, has nothing to do with OpenOffice.org or ODF? And you may have wondered: What are Sun's OpenOffice.org developers doing with OOXML, and why? And when will we have an OOXML filter in OpenOffice.org?

I will try to give answers to these questions.

First of all, why is Sun's OpenOffice.org team developing OOOXML filters, respectively participating in their development? Well, it has  always been in our strong interest that OpenOffice.org users can seamlessly interact with multiple file formats, including the binary formats of MS Office. So it is only natural that we care about OOXML now, too.

OOXML is the file format of Microsoft Office 2007. That means, sooner or later, people that use Microsoft Office 2007 and want to migrate to OpenOffice.org will look for ways to get existing OOXML documents into OpenOffice.org. And at some point in time, OpenOffice.org users will receive OOXML documents, because Microsoft Office 2007 users will start sending them out, assuming that everyone can read them.

However, Microsoft Office 2007 is new. Very new. Someone who has just spent a considerable amount of money for a Microsoft Office 2007 license will not migrate to OpenOffice.org immediately. In addition, not many OOXML documents are already being distributed, probably because users can't be sure who at the other end can already open OOXML files. Thus, we do not expect high initial demand for OOXML filters. But most likely it will increase over time. And the 6000-page OOXML specification is not implemented at a weekend (and also not in a month, or two, or thee, or six). So it is recommended to start early. That's what we did.

How far have we got with the development of OOXML filters? First of all, Sun's OpenOffice.org developers are only working on import filters, that is, filters that read OOXML documents into OpenOffice.org. We are not working on export filters, that is, filters that save OOXML documents. Simple reason is that the both situations I've described above only require OOXML import filters. For saving documents, we have ODF. I will come back to ODF later.

For the import filters, we have made some progress, but they are still in an early stage. Everyone who participated in the development of a complex filter knows that the first steps are the easiest, and that it does not take much time until the basic information of a document can be displayed. The result at this stage may already look promising. But the real work just starts at this stage. One may compare this with the implementation of an arbitrary XML file format. How easy is it to display the text that is in the document. And how promising does it look if the text is already displayed. But at this stage, you haven't even started to look at all the other information in the document, and these are in fact the interesting ones, because they distinguish the XML file format from a text file format.

And if you are a file format developer, you know what will happen if you show these early results to your boss. He will say “ Wow! Impressive. So, when will you be ready? In a week?”. And you will have to say: “No, no. It's only the beginning, it will take a year or so”. And your boss will ask: “Hey, why does it take you so long?”. I think every software developer knows how this story continues. So, we should not raise wrong expectations (not with our boss - we are lucky that he understands that filter development is a complex task, but to the OpenOffice.org users). For that reason, the current filter prototypes will not be included into the official OpenOffice.org distribution.

Actually, there is another issue that makes it difficult to say where we are. OOXML has been approved by ECMA as ECMA-376. So far, so good. But it is no secret that during the review of ECMA-376 by the ISO national member bodies, which is still ongoing, a lot of comments and objections have been raised. And the question is, how will Microsoft and ECMA respond, and what impact will this have on OOXML? Will there be a new version of OOXML? Will it be compatible or incompatible? How soon will they update Microsoft Office? In short: It is unclear whether an OOXML filter implementation based on the current specification will be sufficient for migrating from MS Office to OOXML, and how much effort it will be to adapt it, if necessary.

To be clear: I'm not saying this to worry someone. I'm one-hundred percent sure that Microsoft will provide a good migration path for the customers of their products in case they have to adapt OOXML, and I do not expect unsolvable problems for all others. I further would like to emphasize that it is not uncommon that reviews of specifications result in comments, and that specifications are changed to respond to such comments. I further do not expect answers to above questions at the moment. The review of OOXML is still ongoing, and the results are not known. So it would be too early to say how to respond for everyone in that situation, not only for Microsoft. I'm mentioning these uncertainties around OOXML only because they may have the impact that OOXML implementations have to be changed. That would require some extra work, and that again makes it even more difficult to say how far we got with the development of the OOXML import filters. That's all.

Actually, comments that have to be processed by a technical committee or standard organization are an essential part of a standardization processes. Standardization processes usually include internal or public reviews for the purpose of ensuring a high quality of a specification, even though this sometimes requires that the specification has to be changed. So, this is not unique to the development of OOXML, but applied also to ODF. However, the ODF specification may have had an advantage in this regard, because more time has spend for developing the specification although it is much smaller than OOXML, and because OASIS rules required a public review of the ODF specification before it could be approved as an OASIS standard, but the ECMA rules did not require that for OOXML. Also, since the ODF specification is much shorter than the OOXML specification, it is easier to review.

But let's go back to the OOXML filters. Does the development of OOXML import filters mean that we have changed our mind regarding ODF? A clear, a very, if not to say a crystal clear: No. We strongly believe, ODF is the only file format that provides the level of interoperability and choice of products that our customers want. ODF is the file format whose development and standardization we are actively supporting at OASIS, and it remains the native file format of OpenOffice.org. And with the Sun ODF Converter software, we make ODF available even to Microsoft Office users. So, nothing is changed here.


tags:

Posted by Michael Brauer on 15 Aug 2007  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[15]

Comments

Thomas said:

Hi Michael,
thanks for your info. Will the Sun ODF Converter software be updated on the basis of the new code of the OOXML import filters to improve the file conversation (when the import filters are ready some day)?

And are the enhancements of ODF 1.2 necessary for good OOXML filters (like the experimental daVinci plugin needs some particular "features" in ODF 1.2 and OOo)? Or will the filters developed based on ODF 1.0/1.1 which is (almost) already supported in OOo?

Posted by Thomas on August 15, 2007 at 01:46 PM CEST #

JZA said:

Well I think there are two different situations, one is having OpenOffice.org support to the MS Office product and very similar is the imfamity that M$ is working out this file format.

Posted by JZA on August 15, 2007 at 06:29 PM CEST #

Michael Brauer said:

Hi Thomas,

I can't comment on the Sun ODF Converter software, but only on the OOXML filters.

The OOXML filters do not require ODF. They are C++ components that directly translate OOXML into the OpenOffice.org model. They are therefore independent of ODF, or a particular ODF version.

However, ODF 1.2 contains some enhancements that will allow the OpenOffice.org community to implement new "features" for OpenOffice.org which so far did only exist in Microsoft Office. That again means that we can support these "features" also in the OOXML filters. Please note that I have set the term "features" in double quotes, because the things I'm talking about here are mostly not entirely new features, but small enhancements to existing ones.

Well, I should be more precise anyway: We of cause could implement these "features" also without the ODF 1.2 enhancements, and we could also support them in the OOXML filters. But the ODF 1.2 enhancements allow us to store these features actually in ODF. So, the OOXML filters are still independent of ODF, but the ODF 1.2 enhancements will allow us to store the new "features" in ODF.

Posted by Michael Brauer on August 16, 2007 at 05:08 PM CEST #

Dave said:

I thought I remembered hearing somewhere that OOo was going to use the import/export filters that Novell has already completed. It looks like you're developing them on your own instead, from scratch, or rather an import filter only... why?

Posted by Dave on August 17, 2007 at 01:02 AM CEST #

Mathias Bauer said:

Hi Dave, perhaps you confused two things. <br>
Novell developers have worked on the "clever age" converter project that was started from Microsoft. They also created some wrapper code for OOo that used the external converter application as part of a "filter" to import and export OOXML documents. The converter itself is slow and using it as an external conversion application in OOo is even slower, way slower than a "real" filter. <br>
Also the *overall* conversion quality is comparably low, though in some areas it works quite well. The overall quality is not as good as e.g. the quality of the MSOffice binary filters of OOo. This is true especially for the OOXML->ODF conversion that IMHO is more important for OOo than the opposite direction. Due to the fact that this converter is XSLT based and so will fail in all cases where some advanced logic or layout information is needed for the conversion I'm afraid that it will never reach the necessary quality. At least not the quality those users expect that nowadays write bug reports for all the nice more or less subtle problems in our current binary filters. ;-) <br>
We, the Sun OOo developers, started to implement "real" (means: integrated and optimized) filters some time ago and Novell developers later on joined this effort. It's these filters Michael Brauer was talking about.

Posted by Mathias Bauer on August 17, 2007 at 12:32 PM CEST #

Plan-B for Software Documentation said: [Trackback] Michael Brauer reports that OpenOffice development works on support for MS-OOXML aka ECMA-376. He states that the work is only at its start and that OpenOffice.org will continue to use ODF (ISO 26300) as its standard format. ...

Posted by Plan-B for Software Documentation on August 17, 2007 at 03:57 PM CEST #

Waiting for I/O said: [Trackback] The technical debate around whether OOXML (ECMA 376) should be approved as an ISO standard (fast track or otherwise) has been raging for a while. Numerous organizations and individuals have expressed technical and legal concerns. The latest summary o...

Posted by Waiting for I/O on August 19, 2007 at 11:26 PM CEST #

OesterBlog said: [Trackback] Seit OpenDocument 1.1 im Frühjahr bei der OASIS als offizielles Dokument verabschiedet worden ist, habe ich die Mailinglisten der ODF-Macher verfolgt und bin etwas kritischer geworden, was die Entwicklung der Interoperabilität von ODF angeht. Was i...

Posted by OesterBlog on August 21, 2007 at 09:25 PM CEST #

Juan Carlos Girardi said:

why?
I don't say that is bad that Openoffice interact with MS-OOXML but I ask why you do the job?
I don't catch the point. And if i catch I prefer do not. You have ODF that is "realy" open, "realy" standard, free, well designed and you work on that MS-OOXML that neither is XML nor OPEN and nobody ask before. Its the first time i saw in tecnology that the solution comes before than the problem.

Posted by Juan Carlos Girardi on August 24, 2007 at 12:27 AM CEST #

Matthew Flaschen said:

I think there will be demand for OOXML export filters eventually. Though only ODF is a true, multiparty open document standard, OpenOffice.org users are still going to want to send files in the latest Office 2007 native format, and for the forseeable future that will be OOXML.

Posted by Matthew Flaschen on August 28, 2007 at 11:14 AM CEST #

Justas said:

I think it is good not to do OOXML export filters. Import filters are necessary, but it's no need for export filters to broken standard, and it will be only better for people.

Posted by Justas on August 28, 2007 at 02:28 PM CEST #

Gustavo Guillermo Perez said:

I agreed, there is no need to support exporting, cause there is not the problem of OOO users, and always gona be exporting problems between formats.

Posted by Gustavo Guillermo Perez on August 30, 2007 at 12:25 AM CEST #

Răzvan Sandu said:

Even if OOXML files are used only in Microsoft environments, I think OOXML filters are *always* necessary in OpenOffice.org. If they're missing, they only limit OOo users' options, not Microsoft's.

I also think that a more urgent task is to seamlessly integrate the Sun ODF converter in Microsoft Word & Excel and to develop one for PowerPoint, too. I mean letting stubborn MS Office users to save ODF documents *by default*, without aditional warnings boxes, as natural as they save in .doc or .docx. As someone noted on a blog, the vast majority of non-technical users don't give a damn on the file format itself - they just want to open & save the file as quick as possible. What they actually note (and some of them dislike) is the difference between MS Office and OOo's look and feel.

We also have to have a quick and more polished *standalone* converter program for Microsoft formats, one-way (Microsoft -> ODF), in both graphical UI and command-line versions. One doesn't have to install the full OOo suite to batch-convert thousands of Microsoft files, in complex directory structures. If such a tool will exist, maybe we can go further: we can use a piece of code from it in a plugin for amavisd-new (http://www.ijs.si/software/amavisd/), to convert on-the-fly Microsoft e-mail attachements that pass mailservers... ;-)

I find *extremely important* that ODF should be supported in *as many well-known applications as possible* - to quicky create a huge mass of ODF documents on the market. This will degrade Microsoft's market position in an exponential manner, both for Office applications and the SO itself.

Regards,
Răzvan

Posted by Răzvan Sandu on August 31, 2007 at 04:54 PM CEST #

Răzvan Sandu said:

Almost to forget...

*Many* users read office documents, sent as e-mail attachements, on mobile phones or Palms. It is critical to have at least a viewer for ODF files on these mobile devices. For example, it's a pity that advanced smartphones such as Nokia E61i (Symbian), Palms, Motorola Q and so on *only implement support for Word & Excel formats* in their tiny office applications.

Răzvan

Posted by Răzvan Sandu on August 31, 2007 at 05:05 PM CEST #

PhilW said:

I hear a lot of people who believe that Sun/Ooo, in developing OOXML importers, are "supporting" or "endorsing" Microsoft's development of competing, perhaps proprietary, XML-based document formats.

Then those people get a document from someone who has saved the file in DOCX or some other OOXML format. An example conversation might go:

RECEIVER: "Hey, could you send me that document in an earlier Word format?"

SENDER: "Huh?"

R: "I can't read the document you sent me."

S: "It looks good on my computer - here, I'll send you another copy."

R: "No no no. My word processor won't read the format that Word 2007 is saving by default."

S: "Aren't you using Word 2007?"

R: "No - I am using OpenOffice."

S: "Why?"

R: "Because it's free."

S: "Oh, that's right, you don't have a job."

R: "That has nothing to do with it. Why would I pay for something that I can get for free?"

S: "Because you have a job."

R: "You mean you are paying for copies of Office 2007 for all of your family's computers, and then dealing with the hassle of re-activation whenever you re-install Windows?"

S: "No, my company paid for copies of Office 2007 for all of my computers. See what I mean about having a job? Oh - and I haven't ever had to re-install Windows."

R: "Grrrrr."

S: "Listen, I'm not trying to be difficult. I really like the idea of free office productivity software. But if your OpenOffice is so great then why can't it read the latest Office 2007 formats? And by the way, you could always download Word Viewer 2003 from Microsoft (for free) and then download the Office 2007 Compatibility Pack (also free) and read the docs I send you with a double-click."

R: "Grrrrr. Microsoft - evil - monopoly - substitute dollar sign for s to show how I feel - grrrr."

S: "I know - it's a pain, but OpenOffice will really only continue its success if it keeps up with current competing technology. This is how Microsoft beat monopolies like WordPerfect and NetWare (probably before your time.) Sooner or later most people like me will be in a position in which we aren't getting free Microsoft Office from work, and we'll only switch to OpenOffice if we feel like we will still be able to read the stuff that anyone sends us. You can't beat the competition through obstinance and verbal manipulation - unless of course you have the monopoly."

Posted by PhilW on September 10, 2007 at 07:49 PM CEST #

Post a Comment:
Comments are closed for this entry.
« Migrating OpenOffice... | Main | New: OOo-Dev 2.3... » GullFOSS