The Sun BabelFish Blog
Don't panic !
how binary relations beat tuples
Last week I was handed a puzzle by Francois Bry: "Why does RDF limit itself to binary relations? Why this deliberate lack of expressivity?".
Logical Equivalence Reply
My initial answer was that all tuples could be reduced to binary relations. So take a simple table like this:
User ID name address birthday course homepage
1234 Henry Story 21 rue Saint Honoré
Fontainebleau
France 29 July philosophy http://bblfish.net/
1235 Danny Ayers Loc. Mozzanella, 7
Castiglione di Garfagnana
Lucca
Italy 14 Jan semweb http://dannyayers.com
The first row in the above column can be expressed as a set of binary relations as shown in this graph:
The same can clearly be done for the second row.
Since the two models express equivalent information I would opt aesthetically for the graph over the tuples, since it requires less primitives, which tends to make things simpler and clearer. Perhaps that can already be seen in the way the above table is screaming out for refactoring: a person may easily have more than one homepage. Adding a new homepage relation is easy, doing this in a table is a lot less so.
But this line of argument will not convince a battle worn database administrator. Both systems do the same thing. One is widely deployed, the other not. So that is the end of the conversation. Furthermore it seems clear that retrieving a row in a table is quick and easy. If you need chunks of information to be together that beats the join that seems to be required in the graph version above. Pragmatics beats aesthetics hands down it seems.
Global Distributed Open Data
The database engineer might have won the battle, but he will not win the war [1]. Wars are fought at a much higher level, on a global scale. The problem the Semantic Web is attacking is global data, not local data. On the Semantic Web, the web is the database and data is distributed and linked together. On the Semantic Web use case the data won't all be managed in one database by a few resource constrained superusers but distributed in different places and managed by the stake holder of that information. In our example we can imagine three stake holders of different pieces of information: Danny Ayers for his personal information, Me for mine, and the university for its course information. This information will then be available as resources on the web, returning different representations, which in one way or another may encode graphs such as the ones below. Note that duplication of information is a good thing in a distributed network.
By working with the most simple binary relations, it is easy to cut information up down to their most atomic unit, publish them anywhere on the web, distributing the responsibility to different owners. This atomic nature of relations also makes it easy to merge information again. Doing this with tuples would be unnecessarily complex. Binary relations are a consequence of taking the open world assumption seriously in a global space. By using Universal Resource Identifiers (URIs), it is possible for different documents to co-refer to the same entitities, and to link together entities in a global manner.
The Verbosity critique
Another line of attack similar to the first could be that rdf is just too verbose. Imagine the relation children which would relate a person to a list of their children. If one sticks just with binary relations this is going to be very awkward to write out. In a graph it would look like this.
Which in Turtle would give something like this:
:Adam :children
[ a rdf:List;
rdf:first :joe;
rdf:rest [ a rdf:List;
rdf:first :jane;
rdf:rest rdf:nil ];
] .
which clearly is a bit unnecessarily verbose. But that is not really a problem. One can, and Turtle has, developed a notation for writing out lists. So that one can write much more simply:
:Adam :children ( :joe :jane ) .
This is clearly much easier to read and write than the previous way (not to speak about the equivalent in rdf/xml). RDF is a structure developed at the semantic level. Different notations can be developed to express the same content. The reason it works is because it uses URIs to name things.
Efficiency Considerations
So what about the implementation question: with tables oft accessed data is closely gathered together. This it seems to me is an implementation issue. One can easily imagine RDF databases that would optimize the layout in memory of their data at run time in a Just in Time manner, depending on the queries received. Just as the Java JIT mechanism ends up in a overwhelming number of cases to be faster than hand crafted C, because the JIT can take advantage of local factors such as the memory available on the machine, the type of cpu, and other issues, which a statically compiled C binary cannot do. So in the case of the list structure shown above there is no reason why the database could not just place the :joe and jane in an array of pointers.
In any case, if one wants distributed decentralised data, there is no other way to do it. Pragamatism does have the last word.
Notes
- Don't take the battle/war analogy too far please. Both DB technologies and Semantic Web ones can easily work together as demonstrated by tools such as D2RQ.
Posted at 01:40PM Mar 20, 2008 [permalink/trackback] by Henry Story in SemWeb | Comments[11]
Semantic Web for the Working Ontologist
I am really excited to see that Dean Allemang and Jim Hendler's book "Semantic Web for the Working Ontologist" is now available for pre-order on Amazon's web site. When I met Dean at Jazoon 2007 he let me have a peek at an early copy of this book[1]: it was exactly what I had been waiting a long time for. A very easy introduction to the Semantic Web and reasoning that does not start with the unnecessarily complex RDF/XML [2] but with the one-cannot-be-simpler triple structure of RDF, and through a series of practical examples brings the reader step by step to a full view of all of the tools in the Semantic Web stack, without a hitch, without a problem, fluidly. I was really impressed. Getting going in the Semantic Web is going to be a lot easier when this book is out. It should remove the serious problem current students are facing of having to find a way through a huge number of excellent but detailed specs, some of which are no longer relevant. One does not learn Java by reading the Java Virtual Machine specification or even the Java Language Specification. Those are excellent tools to use once one has read many of the excellent introductory books such as the unavoidable Java Tutorial or Bruce Eckel's Thinking in Java. Dean Allemang and Jim Hendler's books are going to play the same role for the Semantic Web. Help get millions of people introduced to what has to be the most revolutionary development in computer science since the development of the web itself. Go and pre-order it. I am going to do this right now.
Notes
- the draft I looked at 9 months ago had introductions to ntriples, turtle, OWL explained via rules, SPARQL, some simple well known ontologies such as skos and foaf, and a lot more.
- The W3C has recently published a new RDF Primer in Turtle in recognition of the difficulty of getting going when the first step requires understanding RDF/XML.
Posted at 12:41PM Mar 19, 2008 [permalink/trackback] by Henry Story in SemWeb | Comments[3]
Opening Sesame with Networked Graphs
Simon Schenk just recently gave me an update to his Networked Graphs library for the Sesame RDF Framework. Even though it is in early alpha state the jars have already worked wonders on my Beatnik Address Book. With four simple SPARQL rules I have been able to tie together most of the loose ends that appear between foaf files, as each one often uses different ways to refer to the same individual.
Why inferencing is needed
So for example in my foaf file I link to Simon Phipps- Sun's very popular Open Source Officer - with the following N3:
For those who still don't know N3 (where have you been hiding?) this says that I know a foaf:Person named "Simon Phipps" whose homepage is specified and for which more information can be found at the http://www.webmink.net/foaf.rdf rdf file. Now the problem is that the person in question is identified by a '[' which represents a blank node. Ie we don't have a name (URI) for Simon. So when the Beatnik Address Book gets Simon's foaf file, by following the
:me foaf:knows [ a foaf:Person;
foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
foaf:name "Simon Phipps";
foaf:homepage <http://www.webmink.net/>;
rdfs:seeAlso <http://www.webmink.net/foaf.rdf>;
] .
rdfs:seeAlso relation, it gets among others something like
This file then contains at least two people. Which one is the same person? Well a human being would guess that the person named "Simon Phipps" is the same in both cases. Networked Graphs helps Beatnik make a similar guess by noting that the foaf:homepage relation is an owl:InverseFunctionalProperty.
[] a foaf:Person;
foaf:name "Simon Phipps";
foaf:nick "webmink";
foaf:homepage </>;
foaf:knows [ a foaf:Person;
foaf:homepage <http://www.buzzword-compliant.com/>;
rdfs:seeAlso <http://www.buzzword-compliant.com/foaf.rdf>;
] .
Some simple rules
After downloading Simon Phipps's foaf file and mine and placing the relations found in them in their own Named Graph, we can in Sesame 2.0 create a merged view of both these graphs just by creating a graph that is the union of the triples of each .
The Networked Graph layer can then do some interesting inferencing by defining a graph with the following SPARQL rules
This is simply saying that if two names for things have the same homepage, then these two names refer to the same thing. I could be more general by writing rules at the owl level, but those would be but more complicated, and I just wanted to test out the Networked Graph sail to start with. So the above will add a bunch of owl:sameAs relations to our NetworkedGraph view on the Sesame database.
#foaf:homepage is inverse functional
grph: ng:definedBy """
CONSTRUCT { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b . }
WHERE {
?a <http://xmlns.com/foaf/0.1/homepage> ?pg .
?b <http://xmlns.com/foaf/0.1/homepage> ?pg .
FILTER ( ! SAMETERM (?a , ?b))
} """^^ng:Query .
The following two rules then just complete the information.
They make sure that when two things are found to be the same, they have the same properties. I think these two rules should probably be hard coded in the database itself, as they seem so fundamental to reasoning that there must be some very serious optimizations available.
# owl:sameAs is symmetric
#if a = b then b = a
grph: ng:definedBy """
CONSTRUCT { ?b <http://www.w3.org/2002/07/owl#sameAs> ?a . }
WHERE {
?a <http://www.w3.org/2002/07/owl#sameAs> ?b .
FILTER ( ! SAMETERM(?a , ?b) )
} """^^ng:Query .
# indiscernability of identicals
#two identical things have all the same properties
grph: ng:definedBy """
CONSTRUCT { ?b ?rel ?c . }
WHERE { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .
?a ?rel ?c .
FILTER ( ! SAMETERM(?rel , <http://www.w3.org/2002/07/owl#sameAs>) )
} """^^ng:Query .
Advanced rules
Anyway the above illustrates just how simple it is to write some very clear inferencing rules. Those are just the simplest that I have bothered to write at present. Networked Graphs allows one to write much more interesting rules, which should help me solve the problems I explained in "Beatnik: change your mind" where I argued that even a simple client application like an address book needs to be able to make judgements on the quality of information. Networked Graphs would allow one to write rules that would amount to "only believe consequences of statements written by people you trust a lot". Perhaps this could be expressed in SPARQL as
Going from the above it is easy to start imagining very interesting uses of Networked Graph rules. For example we may want to classify some ontologies as trusted and only do reasoning on relations over those ontologies. The inverse functional rule could then be generalized to
CONSTRUCT { ?subject ?relation ?object . }
WHERE {
?g tr:trustlevel ?tl .
GRAPH ?g { ?subject ?relation ?object . }
FILTER ( ?tl > 0.5 )
}
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX : <https://sommer.dev.java.net/ontologies/beatnik#>
CONSTRUCT { ?a owl:sameAs ?b . }
WHERE {
GRAPH ?g { ?inverseFunc a owl:InverseFunctionalProperty . }
?g a :TrustedOntology .
?a ?inverseFunc ?pg .
?b ?inverseFunc ?pg .
FILTER ( ! SAMETERM (?a , ?b))
}
Building the So(m)mer Address Book
I will be trying these out later. But for the moment you can already see the difference inferencing brings to an application by downloading the Address Book from subversion at sommer.dev.java.net and running the following commands (leave the password to the svn checkout blank)
> svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer --username guest
> cd sommer
> ant jar
> cd misc/AddressBook/
> ant run
Then you can just drag and drop the foaf file on this page into the address book, and follow the distributed social network by pressing the space bar to get foaf files. To enable inferencing you currently need to set it in the File>Toggle Rules menu. You will see things coming together suddenly when inferencing is on.There are still a lot of bugs in this software. But you are welcome to post bug reports, or help out in any way you can.
Where this is leading
Going further it seems to me clear that Networked Graphs is starting to realise what Guha, one of the pioneers of the semantic web, wrote about in this thesis "Contexts: A Formalization and Some Applications", which I wrote a short note on Keeping track of Context in Life and on the Web a couple of years ago. That really helped me get a better understanding of the possibilities of the semantic web.
Posted at 05:40PM Mar 05, 2008 [permalink/trackback] by Henry Story in Java | Comments[10]
sparqling international calling codes
The other day I was looking for a list of international calling codes. Since most of them are listed in Wikipedia, it occurred to me it would be easy to get all that information in a nice easy to use format by querying DBPedia with SPARQL. So I wrote a very light weight SPARQL client (source code available here). Download the jar and you can then run the following query:
hjs@bblfish:0$ java -jar Sparql.jar > results.n3
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT { ?cntry dbp:callingCode ?code ;
rdfs:label ?name .
} WHERE {
?cntry dbp:callingCode ?code .
OPTIONAL { ?cntry rdfs:label ?name . }
}
^d
That is after typing the command line java -jar Sparrql.jar > results.n3 I pasted the SPARQL query (in blue above) and ended the input with control-d, which on unix is the end-of-file character.
This sent the query to DBPedia, and returned a long list of answers which were place in results.n3 of which the first set is
<http://dbpedia.org/resource/Abu_Dhabi_%28emirate%29> <http://www.w3.org/2000/01/rdf-schema#label> "\u963F\u5E03\u624E\u6BD4\u914B\u957F\u56FD\""@zh , "Abu Dhabi (emirate)"@en , "Abu Dhabi (emirato)"@it , "\u0410\u0431\u0443-\u0414\u0430\u0431\u0438 (\u044D\u043C\u0438\u0440\u0430\u0442)\""@ru ; <http://dbpedia.org/property/callingCode> "971-2"@en .
In the above case the calling code should proabaly not be tagged with an @en. So the data still needs to be cleaned up a little at present. It would be nice to be able to quickly fix the data when one notices something like this. Most of the other results are in xsd:integer format, which I think is also not quite right. The literal string is a better representation of a calling code I think.
Anyway the data is easy to clean up. And we have an example of a very simple but useful query.
Posted at 10:20PM Feb 28, 2008 [permalink/trackback] by Henry Story in Java | Comments[2]


