GullFOSS
OpenOffice.org Engineering at Sun
 
 
 
 
More Flickr photos tagged with openoffice

Today's Page Hits: 1867

Locations of visitors to this page
« News from the Framew... | Main | Development at a... »
Monday, 27 Nov 2006
The New Microsoft Word Filter for Writer
Henning Brinkmann

Let me introduce myself first. My name is Henning Brinkmann. I am working for Sun in the StarOffice/OpenOffice.org team in Hamburg, Germany.

My current responsibility is the Microsoft Word filter in Writer (the text processing application in OpenOffice.org). This includes maintaining the current filter as well as the filter component we are developing at the moment.

This weblog entry will give a brief overview of the new filter for Microsoft Word formats to be used in Writer. It covers the motives for implementing a new filter and the concept used. Further entries will deal with more details of the concept.

Why a new filter?

There are several file formats, that originate from the core of Microsoft Word and are supported by StarOffice/OpenOffice.org:

  • binary Word formats (sw/source/filter/ww8)

  • RTF (sw/source/filter/rtf)

  • WordProcessingML in Microsoft Office 2003 (via XSLT)

For each of these formats there is one separate implementation in OpenOffice 2.0. The result of this quite clear. The feature sets supported are different. If a new feature is to be implemented, there are several filters to do the implementation in. Bugs can appear in each of the filters.

Investigating further reveals that all of the formats mentioned above share common structures, i.e. the structures of the Word core. This fact lead us to the concept described in the following. It shows how we want to solve the problem of implementing common properties several times in the new filter.

The Concept

In principle a filter maps between domains. These domains are the document spaces containing all possible documents of the formats that the filter maps between. In this terminology a filter can also be called a domain mapper.

The core idea in our concept for the new filter is shown in the figure below. Instead of having a domain mapper for each source format, there is an intermediate format that is the input for only one domain mapper.




The domain mapper gets its input from a tokenizer specific for the source format. The tokenizers abstract from the source formats and deliver the documents content in the intermediate format to the domain mapper. Having one domain mapper means implementing mapping of one feature one time and thus reducing maintenance efforts.

The document in StarOffice/OpenOffice.org is generated using a new import API. This way it is possible to uncouple the new filter from the core and deploy it as an UNO component.

The path in the figure marked in blue color shows where we are concentrating our current efforts. The other paths are planned to follow later.

Where Do I Get the New Filter?

If you want to have a look at the current state of the filter you can find information how to retrieve and install the filter component here:

http://wiki.services.openoffice.org/wiki/WriterFilter

How to Help

If you like to participate in our efforts for the new filter, please feel free to contact me at

Henning.Brinkmann@sun.com



tags:

Posted by Henning Brinkmann on 27 Nov 2006  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[1]

Comments

Solveig Haugland said: Hi Henning, The Word filter is such a huge part of OpenOffice.org/StarOffice adoption! Keep up the good work. I'm glad to hear about a new filter and look forward to trying it. Solveig

Posted by Solveig Haugland on November 29, 2006 at 07:29 PM CET #

Post a Comment:
Comments are closed for this entry.
« News from the Framew... | Main | Development at a... » GullFOSS