GullFOSS
OpenOffice.org Engineering at Sun
 
 
 
 
More Flickr photos tagged with openoffice

Today's Page Hits: 1794

Locations of visitors to this page
Wednesday, 01 Aug 2007
Completing PDF support in OOo (Part II)
Kai Ahrens

Ok, now that we let the cat out of the bag, my inbox is filled with some mails asking for more information on the PDF import filter we're going to implement. So, I'd like to give you some details that are yet known, but still discussable if somebody comes up with a better idea:

  • As already mentioned in my comment regarding the initial blog entry, it won't be an option for us to import the PDF content into a Writer document containing floating text and as such a floating layout. So, we decided to write a filter that imports the PDF content as OOo Draw/Impress document.
    With this solution, we'll have the full benefit of a page orientated, fixed layout. All graphical elements will be at fixed positions given in the PDF file and text portions will be combined as most as possible to be anchored in text shapes, ensuring that text portions preserve their exactly given position, but are still editable by the user.
    The challenge with this solution is 'just' to find the most common bounding box for text portions that can be grouped together in one text shape. But this is nothing compared to the 'impossible' and life time task of reconstructing/guessing the whole layout of the original document the PDF document was created from. As you know, PDF files don't contain such structuring information in general, beside some tagged PDF files, on which we can't rely.

  • The next question that arises for development is, what kind of parser to use for reading the basic content of the PDF file. There exists a well known and widely used framework for this: the XPDF library and its derivatives like Poppler. Yeah, that would be a great and well tested framework for us, but unfortunately, it doesn't match with the OOo code licensing, at least at the moment. So, we'll have to write our own parser for this task, which is not bad at all due to the fact that XPDF still lacks some features we would have to implement in either case.

  • The filter itself will be available as a downloadable extension to the standard OOo release. This perfectly fits in our roadmap to create a more unitized OOo packet, consisting of several 'standalone' components, reusable in other context.

  • The most interesting question that came up is that of the timeline for this implementation. Please expect to have the product version of the filter ready for the OOo 3.0 release latest. A detailed release plan for OOo 3.0 is not known at the moment. But, as already mentioned, I expect to have first results available within a few months, so that most of you will be able to enjoy playing around with a pre-release of this filter till the end of this year. We will definitely need your feedback regarding this first release and upcoming ones to add missing parts, fix bugs etc.

  • Some of you asked, if there will be some additional goodies around the whole PDF story in OOo. The answer for this question is 'Yes, there will be some more stuff around the pure import and export filters'. One example for this would be the support for PDF/A, a feature that is currently implemented by community member Giuseppe Castagno.
    Another example would be the support for creating PDF documents containing the original ODF document itself, allowing to read the original content without loss by any ODF enabled application.

I hope that this blog entry answers the most urgent questions for the moment. Please don't hesitate to add any comments, questions, suggestions etc. you have.


tags:

Posted by Kai Ahrens on 01 Aug 2007  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[12]

Thursday, 26 Apr 2007
New MS Word Filter for Writer (Milestone 1)
Henning Brinkmann
We are announcing milestone 1 of the new MS Word Filter for StarOffice. It can be downloaded from here:

http://ooo.services.openoffice.org/pub/OpenOffice.org/cws/upload/writerfilter2/


Available platforms are:

  • Windows

  • Linux

  • Solaris(SPARC)


The following features are covered by this milestone:

  • reading text

  • reading character attributes

  • reading paragraph attributes

  • reading fields

  • reading styles


In the next milestone we plan to provide support for reading tables and section properties.


The following people are currently working on this filter:

  • Henning Brinkmann (Sun Microsystems)

  • Oliver Specht (Sun Microsystems)

  • Fridrich Strba (Novell)


Your feedback is welcome on this mailing list:

mailto:dev@sw.openoffice.org

You can subscribe to the mailing list here:

http://sw.openoffice.org/servlets/ProjectMailingListList


For more information about the filter project, please read this post:

The New Microsoft Word Filter for Writer

or visit the OpenOffice.org wiki page about the filter:

http://wiki.services.openoffice.org/wiki/WriterFilter



tags:

Posted by Henning Brinkmann on 26 Apr 2007  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[1]

Tuesday, 27 Mar 2007
Improved text output in PDF export
Philipp Lohmann

With the now nominated CWS glyphadv (to be integrated in 680m207 or perhaps 680m208) an improvement with respect to text rendering in PDF becomes available. Overall text output of course wasn't bad already - the current method is basically unchanged since OOo1.1 - but in case of occasional strange fonts ugly artifacts like characters overlapping, an uneven right margin in justified text and similar things could sometimes happen. These artifacts are always the result of subtle changes in the assumption of how wide a glyph is that is made by OOo at rendering time vs. what is contained in the actual downloaded font - an effect that could grow quite a bit in case of fonts made artificially bold. The improved text output will synchronize these two possible slightly different values so the position of a glyph can be output more precisely.

As a bonus the new text output saves some PDF code reducing the produced PDF file size a little - in extreme cases (only PDF builtin fonts used, no images) up to 30%. This is however true only for text in the same baseline, so no vertical text. Of course there is also a small drawback (isn't there always ?), namely that font files will have to be accessed one additional time to get their precise metrics before we actually know which characters we will actually use from them in the course of producing the PDF file. This effect is not dramatic however and correct PDF files are preferable to slightly faster PDF generation.

tags:

Posted by Philipp Lohmann on 27 Mar 2007  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Monday, 27 Nov 2006
The New Microsoft Word Filter for Writer
Henning Brinkmann

Let me introduce myself first. My name is Henning Brinkmann. I am working for Sun in the StarOffice/OpenOffice.org team in Hamburg, Germany.

My current responsibility is the Microsoft Word filter in Writer (the text processing application in OpenOffice.org). This includes maintaining the current filter as well as the filter component we are developing at the moment.

This weblog entry will give a brief overview of the new filter for Microsoft Word formats to be used in Writer. It covers the motives for implementing a new filter and the concept used. Further entries will deal with more details of the concept.

Why a new filter?

There are several file formats, that originate from the core of Microsoft Word and are supported by StarOffice/OpenOffice.org:

  • binary Word formats (sw/source/filter/ww8)

  • RTF (sw/source/filter/rtf)

  • WordProcessingML in Microsoft Office 2003 (via XSLT)

For each of these formats there is one separate implementation in OpenOffice 2.0. The result of this quite clear. The feature sets supported are different. If a new feature is to be implemented, there are several filters to do the implementation in. Bugs can appear in each of the filters.

Investigating further reveals that all of the formats mentioned above share common structures, i.e. the structures of the Word core. This fact lead us to the concept described in the following. It shows how we want to solve the problem of implementing common properties several times in the new filter.

The Concept

In principle a filter maps between domains. These domains are the document spaces containing all possible documents of the formats that the filter maps between. In this terminology a filter can also be called a domain mapper.

The core idea in our concept for the new filter is shown in the figure below. Instead of having a domain mapper for each source format, there is an intermediate format that is the input for only one domain mapper.




The domain mapper gets its input from a tokenizer specific for the source format. The tokenizers abstract from the source formats and deliver the documents content in the intermediate format to the domain mapper. Having one domain mapper means implementing mapping of one feature one time and thus reducing maintenance efforts.

The document in StarOffice/OpenOffice.org is generated using a new import API. This way it is possible to uncouple the new filter from the core and deploy it as an UNO component.

The path in the figure marked in blue color shows where we are concentrating our current efforts. The other paths are planned to follow later.

Where Do I Get the New Filter?

If you want to have a look at the current state of the filter you can find information how to retrieve and install the filter component here:

http://wiki.services.openoffice.org/wiki/WriterFilter

How to Help

If you like to participate in our efforts for the new filter, please feel free to contact me at

Henning.Brinkmann@sun.com



tags:

Posted by Henning Brinkmann on 27 Nov 2006  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[1]

GullFOSS