GullFOSS
OpenOffice.org Engineering at Sun
 
Subscribe

Today's Page Hits: 2492

 
Archives
 
« July 2008
SunMonTueWedThuFriSat
  
1
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
Links
Flickr Photos
More Flickr photos tagged with openoffice
Locations of visitors to this page
all tags: accessibility apache api aqua architecture automated_tests automation base beta build calc chart code community compiler cws database development directx download draw eis events export extensions features filter framework graphics gsl gsoc gullfoss i18n import impress installation irc iso26300 java l10n localization mac macros netbeans odf odff ooo ooocon ooxml opendocument openoffice.org patch pdf performance plugin podcast porting qa quality quaste release report sdk snapshot software specification spreadsheet staroffice statistics statuspage sun svg testing toolkit tools usability user-experience vba web wiki writer writerfilter xml
« Incredible numbers... | Main | More Than Just a... »
Friday, 12 Oct 2007
New Extensible Metadata Support With ODF 1.2
Svante Schubert

As this is my first blog on GullFOSS, let me quickly introduce myself. I am Svante Schubert, working since it's beginning for OpenOffice.org, XML project co-lead and the last year working in the OpenDocument Metadata Subcommittee as co-editor of the Metadata Draft.

Now I am excited to let you know that the OASIS OpenDocument Technical Committee has approved an extensible metadata support with the upcoming version of the OpenDocument Format 1.2.

Leveraging existing metadata standards – such as the Resource Description Framework (RDF) - it will be possible to describe the content or characteristics of a file by attaching arbitrary metadata to the document. And not only the document itself can be described, but as well the most important elements, represented by a group of about fifty elements, such as paragraph, table and bookmark.

Why am I so excited about this?

By providing this, future extensions in OpenOffice.org will have the powerful ability to annotate documents and content of being of a certain type, for example marking a document to be an invoice or an invitation.

This will leverage the value of ODF documents in work-flows and document searches tremendously.

As a simple example imagine a company receiving ODF documents via email and dispatching them dependent on their type, for instance forwarding the ODF invoice document directly to their accounting.

As the content of the document can have metadata attached to it as well, metadata can describe certain elements, like defining that a table cell includes the price of a product or in case of an invitation the location of the event to take place. In a word, granular (not only document level) metadata!

Advanced searches can benefit on this as the user might query directly for the price of a certain product and work-flows can be enhanced using more application interaction.

Expanding our earlier mail example, you would receive an ODF document by mail, which is an invitation from one of your friends. The mailer - or an RDF aware extension - could identify the document being an invitation, realizing from the address book the sender being a friend of yours, and after validating the document signature, triggering a calendar invitation for you by automatically taking over the event location and time from the invitation document into your calendar.

Metadata support will make it possible for different applications to work smarter together based on ODF documents and make the life easier for users.

Think of this as the “visual” Web 2.0 approach where users have a sophisticated graphical interface to handle “tagging” of content. Tagging that is made more powerful by its basis in RDF.

If anyone is still not excited by possibilities of metadata, let me list the following aspects:

  • Easy for Users
    Extensions can mask any complexity of metadata from the user by using common graphical user interfaces.

  • Trustworthy Metadata
    Content and metadata will be combined in one standardized package format, which can be made trustworthy using digital signatures.

  • Reuse of Software by building on Standards
    By building on top of existing metadata standards, all the software for RDF metadata can be reused.

  • Full Support of the Semantic Web Architecture
    The complete technology stack of the Semantic Web Architecture can be used within the ODF package.

  • Large User Base
    With OpenOffice.org 3.0 there will be over 100 million user installations supporting RDF metadata next year

Do you now understand my excitement?

For more detailed information you may:

tags:

Posted by Svante Schubert on 12 Oct 2007  |  PermaLink |  Bookmark to del.icio.us Bookmark to del.icio.us |  Digg this Digg this  |  Comments[15]

Comments:

hey svante, this is great. can you talk a little bit about how this is related to an expanded bibliographic function? and/or about timeframes for various implementations bits?

thanks,

matt

Posted by matt price on October 12, 2007 at 09:21 PM CEST #

Hi Matt,

the expanded Bibliographic Functions has everything to do with this!

Citations are planned to be based on this metadata framework. Instead of using a set of new ODF XML elements & attributes, the OOoBib project leader Bruce D'Arcus decided to use an RDF vocabulary which makes the bibliographic data easy to extend and interchangeable with other applications.
This is no coincidence as Bruce is also member of the OASIS ODF Metadata SC where he is one of it's most progressive members.

The bibliographic data will be stored as RDF files in the package while in the ODF content (content.xml) it's planned to use a new metadata text field - which is part of the new framework. This field will be used by a new bibliographic extension to display the citations.
If someone is interested to participate on the bibliographic extension, please visit:
http://bibliographic.openoffice.org/#Developers

Regarding the timeframe: we are now looking for some potential extension creators which help us to mature an UNO API, wrapping an existing RDF opensource software - we as a community have to elaborate - and to provide an easy interface for extension developers.

Detailed timeframes are always hard to give, they depend on many factors especially how much we can work together on this powerful feature.
Nevertheless I believe everybody will agree that we all want this thrilling metadata feature in our OOo3!

Have a nice week-end,
Svante

Posted by Svante Schubert on October 12, 2007 at 10:40 PM CEST #

This does sound like a very important feature to get into ODF. Thanks to all the committee members for the time and effort to develop the spec.

Is there likely to be a big delay before OOo will be ready to handle files that use ODF 1.2?

Am I correct that OOo provides no UI for this, so the only way to access the metadata is through tools or extensions apart from OOo?

Has there been any planning as to what kind of UI (if any) that OOo might expose?

Posted by 69.248.236.96 on October 12, 2007 at 11:37 PM CEST #

I see no reason for a delay of OOo, although instead discussing possible delays I propose to focus on implementing the interesting features of OOo 3. ;-)

Regarding an OOo UI, there is nothing certain yet.
But when I talk about extensions don't forget that extension might be installed by default in OOo.

There can be different kinds of metadata based extensions:
There might be extensions working on predefined set of RDF vocabulary, e.g. bibliographic extension or some generic extensions that allow to load external RDF vocabulary to be used from the user to annotate to the content and document.

Even extensions to add RDF statements by hand are imaginable or extensions that provide Web 2.0 content/document tagging based on a RDF vocabulary describing tags.

In any case our first step should be the ODF Metadata framework itself which means providing the functionality via an UNO API.

Posted by Svante Schubert on October 12, 2007 at 11:57 PM CEST #

Hello Svante:

this seems like a very important feature which many people have been asking for (especially in the area of custom schema support within openoffice, which this feature would address in many ways). What would happen to the exisiting metadata mechanism based on UserDefinedAttributes ?

thanks
ashok

Posted by ashok on October 13, 2007 at 09:11 AM CEST #

Hi Ashok,

Existing metadata mechanisms will certainly remain in the office.

Although I assume that users might want to migrate their existing UserDefinedAttributes based scenarios to the new RDF based framework as this is based on existing metadata standards.
For instance one advantage would be that their attributes are extractable from the ODF (e.g. by a user to RDF files or by any agent collecting metadata) and reusable by other RDF applications.

Bests,
Svante

Posted by Svante Schubert on October 13, 2007 at 09:59 AM CEST #

Will it support things like multi language metadata, multi language template names, fileds?

KAMI

Posted by Kami on October 13, 2007 at 12:28 PM CEST #

The current user-defined key-value fields are a hack that the new metadata system is designed to supercede. I would hope that over time the current metadata properties UI gets revamped to use the new system. For example, imagine being able to add a field to the panel that allows you to select a Creative Commons license, or to set the status of the document, or to add subject tags from a pre-defined vocabulary. Those are all properties of the document that cannot be represented in the current metadata panel.

As for Kami's question, RDF, RDF Schema, and OWL all have support for rich multi-language support. So an OOo implementation certainly ought to exploit that.

Posted by Bruce D'Arcus on October 13, 2007 at 03:05 PM CEST #

Hi, Svante,

Rather than gushing over the possibilities, it would be more helpful if Sun were to speak to how OOo 3.0 is going to handle RDF metadata created by non-Sun applications. Under precisely what circumstances will Sun's apps destroy RDF metadata created by other applications? As you well know, this was an enormous issue on the TC, as was acknowledged by Bruce d'Arcus:

"The bottomline is, because we move so much of the RDF logic into the package, the xml:id attributes become crucial anchor points. In short, if an application removes, say, the xml:id from a text:meta-field or otherwise causes the URI binding to be invalid, the field will break. ***It would be bad for interoperability for applications to do this."***

<http://lists.oasis-open.org/archives/office/200708/msg00042.html>.

As the person who proposed that conformant ODF apps be allowed to destroy RDF metadata created by other applications, both Sun and you personally owe the world an explanation of the circumstances under which your company's products will destroy such metadata. I previously called for use cases to expose any need to do so. The only response I received was the use case of a user-initiated action, which could have been handled under a "shall preserve ... unless" grammatical construct rather than the permissive "should preserve" that you proposed and that was adopted by the TC with the only remaining justification to allow for unforeseen circumstances. See my post linked above for a summary of what happened on the TC and Metadata SC, with links to the relevant other posts.

Specifically, if we use RDF metadata in our MS Office plug-in to establish lossless interoperability between MS Office and ODF applications, would Sun's applications preserve the RDF metadata our plug-in creates for that purpose so the interop can work in both directions? See <http://opendocument.foundation.googlepages.com/home>. Please address your answer in the context of ISO/IEC JTC 1 Directives, which require that standards "specify clearly and unambiguously the conformity requirements that are essential to achieve the interoperability." <http://www.jtc1sc34.org/repository/0856rev.pdf>.

If you continue to evade such questions, why should the world regard as benign Sun's motives for the change in the ODF RDF metadata preservation requirements?

Posted by Marbux on October 13, 2007 at 08:03 PM CEST #

I'd ignore Marbux's bait Svante. It's a typically loaded question designed to trip you up. See recent discussions on this by Rob Weir, me, etc. Contrary to what Marbux says, I think it's just good sense to "evade" his questions (given, for example, that among other things he has threatened lawsuits). I'd rather see more productive discussion on how to deliver on the interoperability possibilities of the new metadata support without the ridiculous conspiracy rhetoric.

Yes, you should strive wherever possible to preserve xml:id's and their referential integrity to their URI mapping, as well as RDF files. But we need to start implementing the support before we can know all the little details that might crop up in that process.

Posted by Bruce D'Arcus on October 13, 2007 at 08:54 PM CEST #

Hi, Bruce,

Did you have in mind Rob Weir's post at http://www.robweir.com/blog/2007/10/cracks-in-foundation.html (?) The one where he responded to my comment by saying in part:

"As for conformance clauses, this is a fair point. OASIS recognizes the need for formal conformance clauses, and as of this summer all OASIS standards, even those not going to ISO, will require a formal conformance clause. So this is something we will need to write up for ODF 1.2. The contents of such a clause is a good thing to be debating in the TC, or even publicly."

Just as long as its not Sun Microsystems staffers or Bruce d'Arcus participating in the debate, right? Heaven forbid that the chair of the OOo Bibliograhic project should actually talk about ODF interoperability conformance requirements.

Loved your line about "we need to start implementing the support before we can know all the little details that might crop up in that process." It speaks volumes about where you're coming from. It all gets worked out between you and Sun, right? No other developers get a clue except by looking at the OOo source code. None of that nasty talk about interoperability conformance requirements in the ODF specification.

But in my world, developers implement *standards* with mandatory conformance requirements rather than standards being a blank check for the developer with the biggest market share to break interop with other developers' implementations. We wanted to implement high fidelity round trip ODF native file support in MS Office, which has the troubling potential to make MS Office the market leading ODF implementation and break Sun's death grip on ODF. What can I say? ISO/IEC Directives are on our side, not yours or Sun's. The more you guys stand silent when violations of the Directives are brought to your attention, the worse your position is going to look when the coming battle at ISO over ODF 1.2 takes place.

And I'll note that my promise of litigation came after more than a month of refusal by anyone on the ODF TC to discuss our high fidelity round-trip business process and migration use cases and a long history of Sun's abuse of the TC to break high fidelity interoperability, both with MS Office and among ODF applications. It was your evasion of the technical issues that caused the promise of litigation; this isn't a philosophical question of whether the chicken or the egg came first. As the U.S. Supreme Court said:

"Typically, private standard-setting associations, like the Association in this case, include members having horizontal and vertical business relations. See generally 7 P. Areeda, Antitrust Law § 1477, p. 343 (1986) (trade and standard-setting associations routinely treated as continuing conspiracies of their members). There is no doubt that the members of such associations often have economic incentives to restrain competition and that the product standards set by such associations have a serious potential for anticompetitive harm. See American Society of Mechanical Engineers, Inc. v. Hydrolevel Corp., 456 U.S. 556, 571 (1982). Agreement on a product standard is, after all, implicitly an agreement not to manufacture, distribute, or purchase certain types of products. Accordingly, private standard-setting associations have traditionally been objects of antitrust scrutiny. See, e. g., ibid.; Radiant Burners, Inc. v. Peoples Gas Light & Coke Co., 364 U.S. 656 (1961) (per curiam). See also FTC v. Indiana Federation of Dentists, 476 U.S. 447 (1986)."

http://laws.findlaw.com/us/486/492.html (reinstating jury award of treble damages under the Sherman Act). You're the one who made the decision to climb into bed with Sun, Bruce when you decided to give ground on the metadata preservation requirement despite recognizing that it was an anti-interop decision. It's easy to discard antitrust conspiracy law as "ridiculous conspiracy rhetoric" outside of court. It's not so easy once things wind up in court, especially when you play outside rules requiring interoperability. As the Sun-backed European Committee for Interoperable Systems said:

"Interoperability is a cornerstone of the ICT industry. In today's networked ICT environments, devices do not function purely on their own, but must interact with other programs and devices. A device that cannot interoperate with the other products with which consumers expect it to interoperate is essentially worthless. It is interoperability that drives competition on the merits and innovation. The ability of different computer products to interoperate allows consumers to choose among them. Because consumers can choose among them, interoperable products must compete with one another, and it is this competition that has driven innovation in the software industry."

http://www.ecis.eu/inter/index.html (.) What you and Sun did to ODF 1.2 RDF metadata was unquestionably anti-competitive, striking at the very heart of well-defined antitrust law on what is permissible in private industry standard setting organizations. So prove me wrong, and ask Svante to disclose how Sun's applications will handle the relevant preservation of RDF interoperability metadata issue. If you do not, your silence *is* the message.

Posted by 71.236.214.69 on October 13, 2007 at 10:47 PM CEST #

"So prove me wrong, and ask Svante to disclose how Sun's applications will handle the relevant preservation of RDF interoperability metadata issue. If you do not, your silence *is* the message."

I don't need to "prove" anything to you. The only thing you should read into my refusal to play your game is the utter lack of respect I have for you and your tactics. The only reason I engage with you at all in this forum is to build up some public record that contradicts your assertions. In many cases, even your facts are wrong (such as your claim that the decision on preservation here was a Sun position; ALL the main implementors held this position).

I don't think Svante should try to answer your question because a) I don't think there's anything he can say that will satisfy you, and b) I do not even believe even *you* can answer your own question in a way that would satisfy you and actually work.

But how about this: rather than ordering everyone else around, why don't *you* do something productive? Write up a blog post somewhere that lays out the precise conditions under which implementors should preserve metadata. This will have to be consistent with the fact that ODF does not mandate support for all features. So if you insist that all xml:id attributes be preserved, for example, you will have to account for the fact that not all implementors support the features encoded in the relevant elements. If you insist that RDF/XML files must be preserved, you will have to lay out exactly what that means, accounting for the fact that the same RDF can be serialized different ways into XML, that editing of application content might reasonably involve removal of metadata, etc., etc. Hell, it will have to account for the fact that not all ODF applications will have *any* metadata support.

So if you can do this in a compelling and clear way free of insults and baseless charges (e.g. just stick to the technical details), that might actually be helpful for implementors and contribute to realizing your goal.

As I said, though, I don't think you can do it.

Posted by 24.210.249.184 on October 14, 2007 at 12:37 AM CEST #

Bruce, before you changed your mind without ever offering a technical justification, you said yourself:

"I'm cc-ing this to the main TC list, because I think this question
about interoperability is critical to ODF; it's not just about the new
metadata support.

"The question is, what should the language be on preservation of xml:id
attributes in ODF 1.2. The xml:id attributes are critical to the
metadata proposal, but they could also be used for other purposes.

"The current proposal says they 'shall' be preserved. Michael and Svante
have suggested this be changed to the weaker 'should' or perhaps to
introduce some notion of 'metadata aware' ODF applications (effectively
a loop-hole).

"My position is that if we change it, we should make it a general
requirement to preserve attributes. We cannot allow applications to
arbitrarily throw out critical attributes."

"...

"... Preserving files and attributes is a trivial requirement. Not doing so will introduce large compatibility problems.

"Really, just to be clear: if applications do not preserve xml:id
attributes, fields will break, and any metadata about document
fragments will be made invalid. Is that really in anyone's interest?
They need not support metadata in any explicit way to do this."

http://lists.oasis-open.org/archives/office/200706/msg00125.html

The *only* use case for destruction of the metadata raised by anyone was as a result of a user-initiated edit. I proposed in response: "So thus far, we have "SHALL preserve unless destroyed through user-initiated action. Any other use cases that require exceptions?" http://lists.oasis-open.org/archives/office/200707/msg00024.html

And now you suggest that *I* should come up with the use cases exposing the need to destroy RDF metadata? You and your compatriots who railroaded this change through rapid-fire votes on the SC and TC are the ones who need to justify your proposal, not me. As the ISO/IEC JTC 1 Directives say, standards must "specify clearly and unambiguously the conformity requirements that are essential to achieve the interoperability." http://www.jtc1sc34.org/repository/0856rev.pdf (.) You are the guys who have to defend what you got through the TC, not me. My proposal to handle the situation by retaining the "SHALL preserve" language with itemized exceptions was ignored, just as the high fidelity interop use cases involving Microsoft-bound business processes and migration to ODF were ignored.

As to your suggestion that my method must account for applications that do not support metadata, we've been all over that on the TC and metadata SC. You deliberately conflate feature support with metadata preservation, which are two separate concepts. Here is Patrick Durusau's essay on the subject. http://lists.oasis-open.org/archives/office/200706/msg00137.html (.) As you said yourself, "[p]reserving files and attributes is a trivial requirement." And the choice is stark: between one-way and round-trip interoperability. The fact that one application does not need particular metadata does not mean that the next app in the processing chain does not.

In short, you are arguing with your own prior statements, not with me. But neither you nor anyone else have come forward with a technical justification for your change in position.

Your attempt to justify what was done by stating that "ALL the main implementors held this position" does no more than illustrate the problem. Standards to not exist to fortify the market positions of entrenched market leaders. They exist to encourage competition, including that from other developers. In our case, we wanted to add native file support for ODF to MS Office with our plug-in, with high fidelity conversions between the Microsoft legacy binary formats and ODF. We pulled it off, but we can't interoperate with OOo because OOo trashes metadata needed for the return trip. You helped close off another interop route by allowing Sun to trash RDF metadata.

Short version: international standards aren't just for the folk who already have implementations. They are for the world. OOo source code is not supposed to be the ODF standard. You envision a process where the entrenched implementers work out interop issues in private. That is not the way international standards work, as the ISO/IEC JTC 1 Directives testify. And as the U.S. Supreme Court has held, that's not the way any standards body located in the U.S. is supposed to work.

Posted by Marbux on October 14, 2007 at 02:44 AM CEST #

"As to your suggestion that my method must account for applications that do not support metadata, we've been all over that on the TC and metadata SC. You deliberately conflate feature support with metadata preservation, which are two separate concepts."

How can you possible know my intention in raising this issue? I cannot believe you're a retired law given the frequency with which you fall back on this sort of baseless rhetoric.

I don't think the two issues are completely separate; at least not from the perspective of the engineers. For example, say application X does not support tables. This means they have no place in their internal data model to map the ODF table content. How are they to preserve that content, including the attribute nodes?

Granted, I think this is less an issue with a full-featured application like OOo, and I would expect the implementation to preserve these attributes and associated metadata.

It's ironic of you to be lecturing everyone about how standards development ought to work. I recall you attending one Metadata SC meeting, and there only under the cloak of a pseudonym. You offered no constructive help in that work, only looking for holes to critique.

You have never once attended a TC meeting in my recollection, preferring instead to fire insults to a mailing list. So you were not at the TC call when this issue was discussed and resolved, and yet you constantly make proclamations about the outcome of that discussion that you state as authoritative fact but which are not. If this issue was so important to you, why weren't you there?

Unlike you, I do not assume if someone has a different view from mine they are part of some sinister conspiracy. I have a lot of respect for David Faure and Thomas Zander, for example. Thomas was on the record being adamantly opposed to requiring preservation. He agreed with the goal of preservation and interoperability, but did not believe it could be achieved in the spec in ways that did not yield negative unintended consequences. His explanations more-or-less lined up with what I heard from engineers at IBM and Sun.

You look to that consensus and see conspiracy. I look to that and see, well, reasonable consensus (which is in fact the basis on which standards development works). As a result, my views have changed a bit based on those discussions.

Posted by Bruce D'Arcus on October 14, 2007 at 09:45 PM CEST #

Svante, I encourage you and your ODF colleagues to consider participating in AIIM's Interoperable Enterprise Content Management (iECM) committee. I have agreed to lead a task team under the auspices of the iECM committee to identify existing standards and, as necessary, specify additional XML tags for all of the elements of metadata conceptually listed in ANSI/AIIM/ARMA TR48-2006. It would be good if those metadata were implemented in ODF.

Owen Ambur
Co-Chair Emeritus, xmlCoP
Co-Chair, StratML CoP
Member, FIRM Board
Member, AIIM iECM Committee
Former Project Manager, ET.gov

Posted by Owen Ambur on December 10, 2007 at 05:05 AM CET #

Post a Comment:
  • HTML Syntax: NOT allowed
« Incredible numbers... | Main | More Than Just a... » GullFOSS