free web site hit counter

The Sun BabelFish Blog

Don't panic !

Thursday Mar 20, 2008

how binary relations beat tuples

Last week I was handed a puzzle by Francois Bry: "Why does RDF limit itself to binary relations? Why this deliberate lack of expressivity?".

Logical Equivalence Reply

My initial answer was that all tuples could be reduced to binary relations. So take a simple table like this:

User IDnameaddressbirthdaycoursehomepage
1234Henry Story21 rue Saint Honoré
Fontainebleau
France
29 Julyphilosophyhttp://bblfish.net/
1235Danny AyersLoc. Mozzanella, 7
Castiglione di Garfagnana
Lucca
Italy
14 Jansemwebhttp://dannyayers.com

The first row in the above column can be expressed as a set of binary relations as shown in this graph:

The same can clearly be done for the second row.

Since the two models express equivalent information I would opt aesthetically for the graph over the tuples, since it requires less primitives, which tends to make things simpler and clearer. Perhaps that can already be seen in the way the above table is screaming out for refactoring: a person may easily have more than one homepage. Adding a new homepage relation is easy, doing this in a table is a lot less so.

But this line of argument will not convince a battle worn database administrator. Both systems do the same thing. One is widely deployed, the other not. So that is the end of the conversation. Furthermore it seems clear that retrieving a row in a table is quick and easy. If you need chunks of information to be together that beats the join that seems to be required in the graph version above. Pragmatics beats aesthetics hands down it seems.

Global Distributed Open Data

The database engineer might have won the battle, but he will not win the war [1]. Wars are fought at a much higher level, on a global scale. The problem the Semantic Web is attacking is global data, not local data. On the Semantic Web, the web is the database and data is distributed and linked together. On the Semantic Web use case the data won't all be managed in one database by a few resource constrained superusers but distributed in different places and managed by the stake holder of that information. In our example we can imagine three stake holders of different pieces of information: Danny Ayers for his personal information, Me for mine, and the university for its course information. This information will then be available as resources on the web, returning different representations, which in one way or another may encode graphs such as the ones below. Note that duplication of information is a good thing in a distributed network.

By working with the most simple binary relations, it is easy to cut information up down to their most atomic unit, publish them anywhere on the web, distributing the responsibility to different owners. This atomic nature of relations also makes it easy to merge information again. Doing this with tuples would be unnecessarily complex. Binary relations are a consequence of taking the open world assumption seriously in a global space. By using Universal Resource Identifiers (URIs), it is possible for different documents to co-refer to the same entitities, and to link together entities in a global manner.

The Verbosity critique

Another line of attack similar to the first could be that rdf is just too verbose. Imagine the relation children which would relate a person to a list of their children. If one sticks just with binary relations this is going to be very awkward to write out. In a graph it would look like this.

image of a simple list as a graph

Which in Turtle would give something like this:

:Adam :children 
     [ a rdf:List;
       rdf:first :joe;
       rdf:rest [ a rdf:List;
            rdf:first :jane;
            rdf:rest rdf:nil ];
     ] .

which clearly is a bit unnecessarily verbose. But that is not really a problem. One can, and Turtle has, developed a notation for writing out lists. So that one can write much more simply:

:Adam :children ( :joe :jane ) .

This is clearly much easier to read and write than the previous way (not to speak about the equivalent in rdf/xml). RDF is a structure developed at the semantic level. Different notations can be developed to express the same content. The reason it works is because it uses URIs to name things.

Efficiency Considerations

So what about the implementation question: with tables oft accessed data is closely gathered together. This it seems to me is an implementation issue. One can easily imagine RDF databases that would optimize the layout in memory of their data at run time in a Just in Time manner, depending on the queries received. Just as the Java JIT mechanism ends up in a overwhelming number of cases to be faster than hand crafted C, because the JIT can take advantage of local factors such as the memory available on the machine, the type of cpu, and other issues, which a statically compiled C binary cannot do. So in the case of the list structure shown above there is no reason why the database could not just place the :joe and jane in an array of pointers.

In any case, if one wants distributed decentralised data, there is no other way to do it. Pragamatism does have the last word.

Notes

  1. Don't take the battle/war analogy too far please. Both DB technologies and Semantic Web ones can easily work together as demonstrated by tools such as D2RQ.

Wednesday Mar 19, 2008

Semantic Web for the Working Ontologist

I am really excited to see that Dean Allemang and Jim Hendler's book "Semantic Web for the Working Ontologist" is now available for pre-order on Amazon's web site. When I met Dean at Jazoon 2007 he let me have a peek at an early copy of this book[1]: it was exactly what I had been waiting a long time for. A very easy introduction to the Semantic Web and reasoning that does not start with the unnecessarily complex RDF/XML [2] but with the one-cannot-be-simpler triple structure of RDF, and through a series of practical examples brings the reader step by step to a full view of all of the tools in the Semantic Web stack, without a hitch, without a problem, fluidly. I was really impressed. Getting going in the Semantic Web is going to be a lot easier when this book is out. It should remove the serious problem current students are facing of having to find a way through a huge number of excellent but detailed specs, some of which are no longer relevant. One does not learn Java by reading the Java Virtual Machine specification or even the Java Language Specification. Those are excellent tools to use once one has read many of the excellent introductory books such as the unavoidable Java Tutorial or Bruce Eckel's Thinking in Java. Dean Allemang and Jim Hendler's books are going to play the same role for the Semantic Web. Help get millions of people introduced to what has to be the most revolutionary development in computer science since the development of the web itself. Go and pre-order it. I am going to do this right now.

Notes

  1. the draft I looked at 9 months ago had introductions to ntriples, turtle, OWL explained via rules, SPARQL, some simple well known ontologies such as skos and foaf, and a lot more.
  2. The W3C has recently published a new RDF Primer in Turtle in recognition of the difficulty of getting going when the first step requires understanding RDF/XML.

Friday Aug 17, 2007

Open Data: Information wants to be linked

With over 2 billion relations from the great web community data projects such as Wikipedia, Project Gutenberg, Music Brainz, and many more... the Linking Open Data initiative is tying together a vast pool of quality machine readable information on which one can run any of the over 500 Semantic Web tools. As the value of linked information increases much faster than the networks described by Metcalf's Law, the value of this must be tremendous.

By creating data browsing interfaces such as Tabulator, one has a very simple RESTful, Resource Oriented Architecture API to work with. With various SPARQL endpoints available or to be built, one can treat that information like a hugely powerful database.

Forget Web APIs: long live linked data!

Some of the projects listed are:

Search

Flickr Diary

www.flickr.com
This is a Flickr badge showing public photos from bblfish. Make your own badge here.

Recent Entries

Navigation

Referers