The Sun BabelFish Blog
Don't panic !
syntax and semantics: the relation between xml and rdf
I have heard people wonder what the relation is between the xml
technologies being developed at the W3C and the semantic technologies. Why
does there seem to be so much duplication? Why is there a XQuery
language and a SPARQL
query language? What do they do differently?
So the answer is in part reducible to the distinction between syntax and semantics. Syntax being the science of how signs can be put together to create sentences. Semantics being how these signs relate to the world so that one can give the sentence a truth value. Consider the sentence "Henry knows Tim". It refers to 3 things. A person named Henry, a person named Tim, and a relation between them. There are many ways of saying the same thing. One can make the same statement in German, in French, in java or in xml. The thing in the world that is being described by all these syntaxes is the same.
XML is a syntax (or perhaps more precisely document structure). RDF is about semantics. That is why there can be so many different ways of writing rdf: as RDF/XML, as N3, as Turtle, or NTriples, etc... So given this it is easy to understand the duplication in query languages.
To boil the distinction down to one sentence: XQuery is a language to query a document, SPARQL is a tool to query the world.
An XQuery can only query one document. RDF helps you to extract the content of many documents, as a set of facts, drop them in one big container, and query that container in one go, essentially allowing one to make relations between facts stated in different documents. This is the fundamental difference between these two technologies, and why they are both needed.
So let me expand a little on my example above. Let us start with two
documents. One stating in xml who knows who, and another stating
information about each person.
≤Person≥
≤name≥Henry Story≤/name≥
≤mbox≥henry.story@bblfish.net≤/mbox≥
≤knows≥≤Person≥≤name≥Tim Bray≤/name≥
≤mbox≥Tim.Bray@eg.com≤/mbox≥
≤/Person≥
≤Person≥≤name≥Jonathan Story≤/name≥
≤mbox≥Jonathan.Story@eg.edu≤/mbox≥
≤/Person≥
≤/knows≥
≤/Person≥
Next is Doc2 with a made up xml schema specifiying additional information for each person:
≤AddressBook≥
≤Person≥
≤name≥Jonathan Story≤/name≥
≤mbox≥Jonathan.Story@eg.edu≤/mbox≥
≤address≥≤Country≥France≤/Country≥≤/address≥
≤/Person≥
≤Person≥
≤name≥Tim Bray≤/name≥
≤mbox≥Tim.Bray@eg.Com≤/mbox≥
≤address≥≤Country≥Canada≤/Country≥≤/address≥
≤/Person≥
≤/AddressBook≥
Now using a spec such as GRDDL
you can transform the above xml into a set of statements about relations.
Perhaps something like the following, expressed in N3. For Doc 1:
[ a :Person;
:name "Henry Story";
:mbox ≤mailto:henry.story@insead.edu≥;
:knows [ a :Person;
:name "Tim Bray";
:mbox ≤mailto:Tim.Bray@eg.com≥
];
:knows [ a :Person;
:name "Jonathan Story";
:mbox ≤mailto:Jonathan.Story@eg.edu≥
];
] .
which can be represented as a the following graph:
The second document can be transformed into the following N3:
[ a :Person;
:name "Tim Bray";
:mbox ≤mailto:Tim.Bray@eg.com≥
:address [ a :Address;
:country "Canada"@en
]
].
[ a :Person;
:name "Jonathan Story";
:mbox ≤mailto:Jonathan.Story@eg.edu≥
:address [ a :Address;
:country "France"@en
]
].
which can be visualised as the following graph.
These
graphs can unlike perhaps the xml documents from which they stem, be
merged, mechanically, into the following graph especially if the mbox
relation is stated as being inverse functional (ie: a mailbox, if it
refers to a Person, only refers to one Person), written in N3 as:
[ a :Person;
:name "Henry Story";
:mbox ≤mailto:henry.story@insead.edu≥;
:knows [ a :Person;
:name "Tim Bray";
:mbox ≤mailto:Tim.Bray@eg.com≥
:address [ a :Address;
:country "Canada"@en
]
];
:knows [ a :Person;
:name "Jonathan Story";
:mbox ≤mailto:Jonathan.Story@eg.edu≥
:address [ a :Address;
:country "France"@en
]
];
] .
As I said earlier semantics is about the relationship between syntax and the world. Since the world is one thing, one should always be able to map things said about the world into one big unified, non contradictory database. This is what cutting statements into simple relations alows us to do. We can now ask questions that cut across these 2 documents, such as:
SELECT ?name ?mail
WHERE { [ a :Person;
:name "Henry Story";
:knows [ :name ?name;
:mbox ?mail;
:address [ a :Address;
:country "Canada"@en;
]
]
].
}
In english: "Who does Henry know who lives in Canada, and what is their e-mail address?" This question can only be answered by agregating data from both documents. This is not something that can be done using the XML query languages, which can only answer question on the surface of the document.
Update: The relation between XML and RDF is perhaps not one that it always makes equal sense to make. For example for document formats such as the Open Office document format, XML is really acting like a Markup Language which is what it is meant for, rather that a data transmition language, which the example above is highlighting. When acting as a markup language XML seems to be much more independent, and less in need of a relation to some semantics. One can always create a semantic mapping of course but it seems a lot more superflous: XML stands more on its own. Marking something up as being a title, bold, or a footnote, has a semantics of course, but it is not really something that one has the feeling is very syntactical. When XML is used as a programming language or as a data transmission language as it is often used nowadays (this is how it used in RSS or Atom for example) then it is acting much more like a syntax for which having a semantics is a lot more useful and explanatory. This may give some indication of where XML is useful on its own, and where perhaps it is being used beyond what it is really intended for.
Posted at 01:05AM Dec 10, 2005 [permalink/trackback] by Henry Story in SemWeb |
Note on comments:
- I know the forms below are a little small. We have asked for years for this to be changed, but I don't think it's going to happen soon. In Apple's Safari you can resize the entry box with you mouse. For people using other browsers click on this javascript link, that should allow you to resize your form.
- Comments are moderated, so they will take a little time to appear. Currently moderation means I have to read them personally. Hopefully with OpenId deployment, this will become more automated.
- HTML markup no longer works here, due to some decision made somewhere. Sorry about that.
- If you are having trouble posting, it may be that you need javascript to be enabled. I don't think javascript should be needed for submitting a form, but that's the way it is here.
- Check your comments by using the preview button...
