The Sun BabelFish Blog

Don't panic !

Saturday Dec 10, 2005

syntax and semantics: the relation between xml and rdf


I have heard people wonder what the relation is between the xml technologies being developed at the W3C and the semantic technologies. Why does there seem to be so much duplication? Why is there a XQuery language and a SPARQL query language? What do they do differently?

So the answer is in part reducible to the distinction between syntax and semantics. Syntax being the science of how signs can be put together to create sentences. Semantics being how these signs relate to the world so that one can give the sentence a truth value. Consider the sentence "Henry knows Tim". It refers to 3 things. A person named Henry, a person named Tim, and a relation between them. There are many ways of saying the same thing. One can make the same statement in German, in French, in java or in xml. The thing in the world that is being described by all these syntaxes is the same.

XML is a syntax (or perhaps more precisely document structure). RDF is about semantics. That is why there can be so many different ways of writing rdf: as RDF/XML, as N3, as Turtle, or NTriples, etc... So given this it is easy to understand the duplication in query languages.

To boil the distinction down to one sentence: XQuery is a language to query a document, SPARQL is a tool to query the world.

An XQuery can only query one document. RDF helps you to extract the content of many documents, as a set of facts, drop them in one big container, and query that container in one go, essentially allowing one to make relations between facts stated in different documents. This is the fundamental difference between these two technologies, and why they are both needed.


So let me expand a little on my example above. Let us start with two documents. One stating in xml who knows who, and another stating information about each person.

So here is Doc1 with a made up xml schema:
   ≤Person≥
         ≤name≥Henry Story≤/name≥
         ≤mbox≥henry.story@bblfish.net≤/mbox≥
         ≤knows≥≤Person≥≤name≥Tim Bray≤/name≥
                        ≤mbox≥Tim.Bray@eg.com≤/mbox≥
                ≤/Person≥
                ≤Person≥≤name≥Jonathan Story≤/name≥
                        ≤mbox≥Jonathan.Story@eg.edu≤/mbox≥
                ≤/Person≥
        ≤/knows≥
   ≤/Person≥

Next is Doc2 with a made up xml schema specifiying additional information for each person:
      ≤AddressBook≥
          ≤Person≥
                ≤name≥Jonathan Story≤/name≥
                ≤mbox≥Jonathan.Story@eg.edu≤/mbox≥
                ≤address≥≤Country≥France≤/Country≥≤/address≥
          ≤/Person≥
          ≤Person≥
                ≤name≥Tim Bray≤/name≥
                ≤mbox≥Tim.Bray@eg.Com≤/mbox≥
                ≤address≥≤Country≥Canada≤/Country≥≤/address≥
          ≤/Person≥
       ≤/AddressBook≥
    
Now using a spec such as GRDDL you can transform the above xml into a set of statements about relations. Perhaps something like the following, expressed in N3. For Doc 1:
 
[ a  :Person;
      :name "Henry Story";
      :mbox ≤mailto:henry.story@insead.edu≥;
      :knows [ a :Person;
                    :name "Tim Bray";
                    :mbox ≤mailto:Tim.Bray@eg.com≥
                  ];
      :knows [ a :Person;
                    :name "Jonathan Story";
                    :mbox ≤mailto:Jonathan.Story@eg.edu≥
                  ];
] .
      
which can be represented as a the following graph:

The second document can be transformed into the following N3:

 
[ a :Person;
     :name "Tim Bray";
     :mbox ≤mailto:Tim.Bray@eg.com≥
     :address [ a :Address;
                     :country "Canada"@en
                   ]               
 ].
 [ a :Person;
      :name "Jonathan Story";
      :mbox ≤mailto:Jonathan.Story@eg.edu≥
      :address [ a :Address;
                     :country "France"@en
                    ]               
 ].
      

which can be visualised as the following graph.
These graphs can unlike perhaps the xml documents from which they stem, be merged, mechanically, into the following graph especially if the mbox relation is stated as being inverse functional (ie: a mailbox, if it refers to a Person, only refers to one Person), written in N3 as:

 
[ a  :Person;
      :name "Henry Story";
      :mbox ≤mailto:henry.story@insead.edu≥;
      :knows [ a :Person;
                    :name "Tim Bray";
                    :mbox ≤mailto:Tim.Bray@eg.com≥
                    :address [ a :Address;
                                 :country "Canada"@en
                             ] 
                  ];
      :knows [ a :Person;
                    :name "Jonathan Story";
                    :mbox ≤mailto:Jonathan.Story@eg.edu≥
                    :address [ a :Address;
                                 :country "France"@en
                             ]               
                  ];
] .
      

As I said earlier semantics is about the relationship between syntax and the world. Since the world is one thing, one should always be able to map things said about the world into one big unified, non contradictory database. This is what cutting statements into simple relations alows us to do. We can now ask questions that cut across these 2 documents, such as:

SELECT ?name ?mail
WHERE { [ a :Person;
            :name "Henry Story";
            :knows [ :name ?name;
                     :mbox ?mail;
                     :address [ a :Address;
                                  :country "Canada"@en;
                              ]
                   ]
        ].
      }
  

In english: "Who does Henry know who lives in Canada, and what is their e-mail address?" This question can only be answered by agregating data from both documents. This is not something that can be done using the XML query languages, which can only answer question on the surface of the document.

Update: The relation between XML and RDF is perhaps not one that it always makes equal sense to make. For example for document formats such as the Open Office document format, XML is really acting like a Markup Language which is what it is meant for, rather that a data transmition language, which the example above is highlighting. When acting as a markup language XML seems to be much more independent, and less in need of a relation to some semantics. One can always create a semantic mapping of course but it seems a lot more superflous: XML stands more on its own. Marking something up as being a title, bold, or a footnote, has a semantics of course, but it is not really something that one has the feeling is very syntactical. When XML is used as a programming language or as a data transmission language as it is often used nowadays (this is how it used in RSS or Atom for example) then it is acting much more like a syntax for which having a semantics is a lot more useful and explanatory. This may give some indication of where XML is useful on its own, and where perhaps it is being used beyond what it is really intended for.

Comments:

Note on comments:

Post a Comment:
Comments are closed for this entry.

Search

Recent Entries

Navigation

Referers