The Sun BabelFish Blog

Don't panic !

Monday Aug 28, 2006

crystalizing rdf

Today I am going to coin a new term: "rdf crystalization". According to today's wikipedia a crystal

is a solid in which the constituent atoms, molecules, or ions are packed in a regularly ordered, repeating pattern extending in all three spatial dimensions.
Generally, crystals form when they undergo a process of solidification. Under ideal conditions, the result may be a single crystal, where all of the atoms in the solid fit into the same crystal structure.

As explained previously rdf allows one to describe relations between objects. Since we live in a post-einsteinien world, we know that there is no center of the world, no central object from which everything can be described. Every object can be a taken as a center, and from there we can describe everything else. So we can just describe things one fact at a time the way NTriples does

<http://bblfish.net/people/henry/card#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://bblfish.net/people/henry/card#me> <http://www.w3.org/2000/10/swap/pim/contact#home> _:L28C17 .
_:L28C17 <http://www.w3.org/2003/01/geo/wgs84_pos#lat> "48.404532" .
_:L28C17 <http://www.w3.org/2003/01/geo/wgs84_pos#long> "2.700448" .

(There is an XML equivalent for this called TriX btw.)

or we can write it out by taking one object as the root and trying to form some tree structure of relations from it as I do in my N3 foaf file (N3 is a superset of NTriples). I don't completely succeed there, as there are two roots: one describing me and one describing the document. But anyway,...

So what is rdf crystalization? Well the above makes it clear how fluid rdf is. You can put your facts in any order, you can organise them in trees if it makes it easier to read, etc, etc, ... But what xml people really need is a more deterministic structure so that they can easily parse their documents. They need something more solid. Crystalising an rdf graph therefore is to serialise it into an xml format that has a RelaxNg schema (or something similar), and that is easy to manipulate using tools such as XSLT, XPath, XQuery or DOM.

There are two types of xml documents one can crystalise an rdf graph to.

  1. Ideally one will try to crystalise it to rdf/xml. But to do this one will need to specify an relax-ng schama in addition to the rdf in order to give the rdf/xml structure. An excellent example of this is the RSS1.1 spec. Doing this can be a good way to both
    • make the rdf more widely consumeable - since more people have access to DOM and XSLT tools,
    • easier to specify the xml - since now you get a well defined semantics for free
    • and by carefully specifying relax-ng extension points one can make one's format predicteably extensible, that is it will be clear how to interpret extensions to the xml.
  2. Practically, one will at present most often need to crystalise a graph to plain xml, like atom. The advantage here is that one picks up all the consumers of that xml. One usually looses clear semantics and extensibility, but the data can always be extracted again using GRDDL.
Now as more people get a good understanding of both rdf and xml I believe we will end up with a better understanding what types of improvements to rdf/xml one should make in a version 2 of the spec to ease such crystalizations, i.e. to get rdf/xml crystalizations resemble what xml experts would be really comfortable with looking at the result as plain xml.

And funnily enough, I am told that people do calculate an rdf (radial distribution function) in studying crystallization :-)

Tools

There are few tools to help crystalize an rdf graph at present. Ian Davis has been working on RDFT a template language that allows one to declaratively crystalize an rdf graph. This certainly looks like something every rdf developer needs in his chest of tools. I look forward to an RDFT java implementation.

Examples

  • RSS1.1 spec is a very good example of an rdf/xml crystalisation.
  • The french statistics institute INSEE has made its ldap information available as a xml. I don't quite agree with that article that we need a new query language, but it is a good example of how to think about crystalisations.
  • The INSEE has also made its geographical information available online in a crystalized form

I have started a wiki page on this topic where people can post further good examples, ideas, and techniques.

Technorati Profile

Search

Recent Entries

Navigation

Referers