GullFOSS
OpenOffice.org Engineering at Sun
 
 
 
 
More Flickr photos tagged with openoffice

Today's Page Hits: 1617

Locations of visitors to this page
Friday, 14 Mar 2008
OOXML Import In Writer: A Shape Is a Shape, Is a Shape?
Henning Brinkmann

I am currently working on the import of shapes from OOXML into Writer. As the title of this post suggests, you might think that importing shapes should be straight forward.



And using ODF it is. The following XML particle encodes a rectangle, e.g.:

<draw:rect text:anchor-type="paragraph" draw:z-index="0" draw:style-name="gr1"
    draw:text-style-name="P1" svg:width="1.8543in" svg:height="1.1567in"
    svg:x="1.2764in" svg:y="0.9744in">
    <text:p/>
</draw:rect>

This encoding is common for all of OpenOffice.org's applications. Thus, there is only one piece of code responsible for importing shapes in OpenOffice.org.


When we started importing shapes from OOXML the Impress team already was able to import some shapes from OOXML files produced by PowerPoint 2007. Consequently, we thought we could just reuse their importer code and do some adjustments and that would be it. As one might guess, life is different: If you insert a rectangle shape in PowerPoint 2007 you will get some XML like this:


<p:sp>
    ...
    <p:spPr>
        <a:xfrm>
            <a:off x="2000232" y="1500174"/>
            <a:ext cx="3429024" cy="2000264"/>
        </a:xfrm>
        <a:prstGeom prst="rect"><a:avLst/></a:prstGeom>
    </p:spPr>
    <p:style>
        <a:lnRef idx="2">
            <a:schemeClr val="accent1"><a:shade val="50000"/></a:schemeClr>
        </a:lnRef>
        <a:fillRef idx="1"><a:schemeClr val="accent1"/></a:fillRef>
        <a:effectRef idx="0"><a:schemeClr val="accent1"/></a:effectRef>
        <a:fontRef idx="minor"><a:schemeClr val="lt1"/></a:fontRef>
    </p:style>
    <p:txBody>...</p:txBody>
</p:sp>

This is DrawingML as described in chapter 5 of the Markup Language Reference for OOXML.


Copy the shape and paste it into a word document and you get this in the according DOCX:

<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
    <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas">
        <lc:lockedCanvas xmlns:lc="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas">
        ...
            <a:sp>
                ...
                <a:spPr>
                    <a:xfrm>
                        <a:off x="2000232" y="1500174"/>
                        <a:ext cx="3429024" cy="2000264"/>
                    </a:xfrm>
                    <a:prstGeom prst="rect"><a:avLst/></a:prstGeom>
                </a:spPr>
                <a:txSp>...</a:txSp>
                <a:style>
                    <a:lnRef idx="2">
                        <a:schemeClr val="accent1"><a:shade val="50000"/> </a:schemeClr>
                    </a:lnRef>
                    <a:fillRef idx="1"><a:schemeClr val="accent1"/></a:fillRef>
                    <a:effectRef idx="0"><a:schemeClr val="accent1"/></a:effectRef>
                    <a:fontRef idx="minor"><a:schemeClr val="lt1"/></a:fontRef>
                </a:style>
            </a:sp>
        </lc:lockedCanvas>
    </a:graphicData>
</a:graphic>

Looks pretty similar to the PowerPoint XML. Our “One peace of code for one kind of thing” approach seems to hold. But, if you use Word to insert a rectangle into a Word document (DOCX), you end up with this:

<w:pict>
    <v:rect
        id="_x0000_s1026"
        style="position:absolute;margin-left:83.65pt;margin-top:16.45pt;width:249.75pt;
            height:107.25pt;z-index:251659264"
        fillcolor="#4f81bd [3204]"
        strokecolor="#f2f2f2 [3041]"
        strokeweight="3pt">
        ...
    </v:rect>
</w:pict>

This is VML as described in chapter 6 of the Markup Language Reference for OOXML. The Markup Language Reference tags VML as a deprecated format in OOXML, which is only included to the standard for backward compatibility reasons. Despite, Word 2007 uses VML to store shapes.


So what do we do? As VML is to be considered deprecated in OOXML, one might say: “Do not care about it. Use DrawingML.” If Word 2007 was only a beta release and a final version would abandon VML, that would be the approach to follow. But, the XML above is produced by the current product version of Word 2007. Customers do require that one can use Word and Writer interchangeably. It looks like we have to implement both: VML and DrawingML.


The example above is only one that depicts a more general problem. The designers of ODF had a file format in mind, that describes data. Hence, when the format describes data with the same semantics, it uses the same syntax. OOXML seems to be designed with the application model in mind. There may be different syntaxes for the same semantics, if it fits the already present application model better. But, if you want to create an alternative implementation for the format, this introduces additional effort.



tags:

Posted by Henning Brinkmann on 14 Mar 2008  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[5]

Wednesday, 01 Aug 2007
Completing PDF support in OOo (Part II)
Kai Ahrens

Ok, now that we let the cat out of the bag, my inbox is filled with some mails asking for more information on the PDF import filter we're going to implement. So, I'd like to give you some details that are yet known, but still discussable if somebody comes up with a better idea:

  • As already mentioned in my comment regarding the initial blog entry, it won't be an option for us to import the PDF content into a Writer document containing floating text and as such a floating layout. So, we decided to write a filter that imports the PDF content as OOo Draw/Impress document.
    With this solution, we'll have the full benefit of a page orientated, fixed layout. All graphical elements will be at fixed positions given in the PDF file and text portions will be combined as most as possible to be anchored in text shapes, ensuring that text portions preserve their exactly given position, but are still editable by the user.
    The challenge with this solution is 'just' to find the most common bounding box for text portions that can be grouped together in one text shape. But this is nothing compared to the 'impossible' and life time task of reconstructing/guessing the whole layout of the original document the PDF document was created from. As you know, PDF files don't contain such structuring information in general, beside some tagged PDF files, on which we can't rely.

  • The next question that arises for development is, what kind of parser to use for reading the basic content of the PDF file. There exists a well known and widely used framework for this: the XPDF library and its derivatives like Poppler. Yeah, that would be a great and well tested framework for us, but unfortunately, it doesn't match with the OOo code licensing, at least at the moment. So, we'll have to write our own parser for this task, which is not bad at all due to the fact that XPDF still lacks some features we would have to implement in either case.

  • The filter itself will be available as a downloadable extension to the standard OOo release. This perfectly fits in our roadmap to create a more unitized OOo packet, consisting of several 'standalone' components, reusable in other context.

  • The most interesting question that came up is that of the timeline for this implementation. Please expect to have the product version of the filter ready for the OOo 3.0 release latest. A detailed release plan for OOo 3.0 is not known at the moment. But, as already mentioned, I expect to have first results available within a few months, so that most of you will be able to enjoy playing around with a pre-release of this filter till the end of this year. We will definitely need your feedback regarding this first release and upcoming ones to add missing parts, fix bugs etc.

  • Some of you asked, if there will be some additional goodies around the whole PDF story in OOo. The answer for this question is 'Yes, there will be some more stuff around the pure import and export filters'. One example for this would be the support for PDF/A, a feature that is currently implemented by community member Giuseppe Castagno.
    Another example would be the support for creating PDF documents containing the original ODF document itself, allowing to read the original content without loss by any ODF enabled application.

I hope that this blog entry answers the most urgent questions for the moment. Please don't hesitate to add any comments, questions, suggestions etc. you have.


tags:

Posted by Kai Ahrens on 01 Aug 2007  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[12]

Monday, 30 Jul 2007
Completing PDF support in OOo
Kai Ahrens

Having a very well working and mature PDF export filter in OOo for several years now, it's time to take the final steps regarding full PDF support. Yes, you're right, we're speaking of implementing a native PDF import filter for OOo within the Sun OOo Graphics development team.

As trivial as this task might look like at the moment, there are several topics that need to be discussed in detail before development can be started. This begins with the OOo application, that the import filter will be written for and definitely doesn't end with the appropriate parser that will be used to read the PDF content itself.

I don't want to go into details of the current planning and development phase by now, but please be assured that the final solution is planned to be a total replacement of the currently available tools you normally use in your everyday workflow, preserving the layout as good as possible plus offering editing capabilities for the imported document, a feature that you don't get for free with most of your common tools. Sounds great, doesn't it?

I don't want to be too optimistic, but we're planning for the first prototype to be available within the next few months. Please stay tuned for more details to be provided by the involved development team members within the next days...




tags:

Posted by Kai Ahrens on 30 Jul 2007  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[16]

Thursday, 26 Apr 2007
New MS Word Filter for Writer (Milestone 1)
Henning Brinkmann
We are announcing milestone 1 of the new MS Word Filter for StarOffice. It can be downloaded from here:

http://ooo.services.openoffice.org/pub/OpenOffice.org/cws/upload/writerfilter2/


Available platforms are:

  • Windows

  • Linux

  • Solaris(SPARC)


The following features are covered by this milestone:

  • reading text

  • reading character attributes

  • reading paragraph attributes

  • reading fields

  • reading styles


In the next milestone we plan to provide support for reading tables and section properties.


The following people are currently working on this filter:

  • Henning Brinkmann (Sun Microsystems)

  • Oliver Specht (Sun Microsystems)

  • Fridrich Strba (Novell)


Your feedback is welcome on this mailing list:

mailto:dev@sw.openoffice.org

You can subscribe to the mailing list here:

http://sw.openoffice.org/servlets/ProjectMailingListList


For more information about the filter project, please read this post:

The New Microsoft Word Filter for Writer

or visit the OpenOffice.org wiki page about the filter:

http://wiki.services.openoffice.org/wiki/WriterFilter



tags:

Posted by Henning Brinkmann on 26 Apr 2007  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[1]

GullFOSS