Friday, 14 Mar 2008
Friday, 14 Mar 2008
I am currently working on the import of shapes from OOXML into Writer. As the title of this post suggests, you might think that importing shapes should be straight forward.
And using ODF it is. The following XML particle encodes a rectangle, e.g.:
<draw:rect text:anchor-type="paragraph" draw:z-index="0" draw:style-name="gr1"
draw:text-style-name="P1" svg:width="1.8543in" svg:height="1.1567in"
svg:x="1.2764in" svg:y="0.9744in">
<text:p/>
</draw:rect>
This encoding is common for all of OpenOffice.org's applications. Thus, there is only one piece of code responsible for importing shapes in OpenOffice.org.
When we started importing shapes from OOXML the Impress team already was able to import some shapes from OOXML files produced by PowerPoint 2007. Consequently, we thought we could just reuse their importer code and do some adjustments and that would be it. As one might guess, life is different: If you insert a rectangle shape in PowerPoint 2007 you will get some XML like this:
<p:sp>
...
<p:spPr>
<a:xfrm>
<a:off x="2000232" y="1500174"/>
<a:ext cx="3429024" cy="2000264"/>
</a:xfrm>
<a:prstGeom prst="rect"><a:avLst/></a:prstGeom>
</p:spPr>
<p:style>
<a:lnRef idx="2">
<a:schemeClr val="accent1"><a:shade val="50000"/></a:schemeClr>
</a:lnRef>
<a:fillRef idx="1"><a:schemeClr val="accent1"/></a:fillRef>
<a:effectRef idx="0"><a:schemeClr val="accent1"/></a:effectRef>
<a:fontRef idx="minor"><a:schemeClr val="lt1"/></a:fontRef>
</p:style>
<p:txBody>...</p:txBody>
</p:sp>
This is DrawingML as described in chapter 5 of the Markup Language Reference for OOXML.
Copy the shape and paste it into a word document and you get this in the according DOCX:
<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
<a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas">
<lc:lockedCanvas xmlns:lc="http://schemas.openxmlformats.org/drawingml/2006/lockedCanvas">
...
<a:sp>
...
<a:spPr>
<a:xfrm>
<a:off x="2000232" y="1500174"/>
<a:ext cx="3429024" cy="2000264"/>
</a:xfrm>
<a:prstGeom prst="rect"><a:avLst/></a:prstGeom>
</a:spPr>
<a:txSp>...</a:txSp>
<a:style>
<a:lnRef idx="2">
<a:schemeClr val="accent1"><a:shade val="50000"/> </a:schemeClr>
</a:lnRef>
<a:fillRef idx="1"><a:schemeClr val="accent1"/></a:fillRef>
<a:effectRef idx="0"><a:schemeClr val="accent1"/></a:effectRef>
<a:fontRef idx="minor"><a:schemeClr val="lt1"/></a:fontRef>
</a:style>
</a:sp>
</lc:lockedCanvas>
</a:graphicData>
</a:graphic>
Looks pretty similar to the PowerPoint XML. Our “One peace of code for one kind of thing” approach seems to hold. But, if you use Word to insert a rectangle into a Word document (DOCX), you end up with this:
<w:pict>
<v:rect
id="_x0000_s1026"
style="position:absolute;margin-left:83.65pt;margin-top:16.45pt;width:249.75pt;
height:107.25pt;z-index:251659264"
fillcolor="#4f81bd [3204]"
strokecolor="#f2f2f2 [3041]"
strokeweight="3pt">
...
</v:rect>
</w:pict>
This is VML as described in chapter 6 of the Markup Language Reference for OOXML. The Markup Language Reference tags VML as a deprecated format in OOXML, which is only included to the standard for backward compatibility reasons. Despite, Word 2007 uses VML to store shapes.
So what do we do? As VML is to be considered deprecated in OOXML, one might say: “Do not care about it. Use DrawingML.” If Word 2007 was only a beta release and a final version would abandon VML, that would be the approach to follow. But, the XML above is produced by the current product version of Word 2007. Customers do require that one can use Word and Writer interchangeably. It looks like we have to implement both: VML and DrawingML.
The example above is only one that depicts a more general problem. The designers of ODF had a file format in mind, that describes data. Hence, when the format describes data with the same semantics, it uses the same syntax. OOXML seems to be designed with the application model in mind. There may be different syntaxes for the same semantics, if it fits the already present application model better. But, if you want to create an alternative implementation for the format, this introduces additional effort.
tags: filter ooxml writer writerfilter
It is confusing and will take a while to figure out the transitional arrangements.
My sense of deprecated is different than yours. Deprecated does not mean the feature cannot appear in documents, but that it should not be used in new documents. (Apparently, the BRM changed deprecated to "transitional" which appears to be an improvement, sort of like HTML 4.01 Transitional as opposed to Strict.)
I think a more interesting question is whether you could replace the VML with DrawingML and have it be accepted by Word2007 (if saving as OOXML is being implemented in OOo), even though that is not the current default behavior.
Depending on how ODF evolves, similar things could happen with OOo features that are not covered in the current ODF spec and that end up being handled differently in some future spec (with at least namespace differences). You'd have to deprecate the OOo-specific approach in favor of the (then) standardized one. I don't have a specific example, but you probably have your eye on a few things.
The additional effort will be around for a while, it seems to me. Just like the added pain of supporting the Microsoft Office (name-your-version) binary formats for some time into the future.
Nice post. I admire the technical focus and customer consideration that this reflects.
Posted by orcmid on March 15, 2008 at 04:17 PM CET #
orcmid, it's not a matter of evolving specs and applications, it's a matter of how the spec is associated to applications and what relevance the spec has at all. I firmly believe that OOXML with the modifications that shall make it more acceptable to ISO won't be implemented verbatim by any application on this planet (and this of course includes Microsoft Office).
It doesn't help to move the controversial items of the OOXML specification out of the mandatory part by making them "deprecated" or "optional", in oder to get it more "acceptable" as a standard. This is window-dressing and completely irrelevant for the practical work and the market. Henning's example is just one of the most striking ones.
Honestly - who needs an OOXML filter with only the mandatory parts of the spec? What people want are Word, Excel, Powerpoint filters - and these file formats are OOXML with "extensions" - all the problematic stuff that the people at ISO already refused to standardize. But technically unexperienced people (and that includes most decision makers) won't see the difference. Make out of that what you see fit.
Besides that you are right that standards can evolve - they do, and so does ODF. But that's not what Henning was talking about. He talked about parts of a file format that never have been a mandatory part of its specification but are essential to get the files loaded that are created by the one and only application using this format. I hope I could make this more clear now.
Posted by Mathias Bauer on March 17, 2008 at 06:32 PM CET #
Thanks for the clear information, Henning. I'm sure there are many really interested, to whom this definitely helps to better understand 'the issue' .
Posted by Cor Nouws on March 20, 2008 at 07:41 PM CET #
What about the interesting question raised by Orcmid: whether you could replace the VML with DrawingML and have it be accepted by Word2007?
Thanks.
Posted by Bruno on March 25, 2008 at 05:29 PM CET #
You might want to consider deferring your work on VML in favor of DrawingML. Microsoft has pledged to update Office to conform to the revised version of DIS 29500 (OOXML), provided ISO accepts it as a standard. This would preclude the creation of new documents with VML content in Word (and Excel and PowerPoint) and thus lessen the need for VML support in OOo. Only legacy documents converted from the binary formats and OOXML files authored in the period between the release of Office 2007 and its adaptation to ISO standards would contain VML. In other words, the number of documents with VML content will be small, declining, and thus perhaps unworthy of additional resource investment by the OOo team.
Posted by anon on March 30, 2008 at 07:47 AM CEST #