The Sun BabelFish Blog
Don't panic !
semantics of invalid passports
For those travelers out there who are into XML and semantics, the question is how does one specify that a valid passport has a passport number in it, semantically.
In OWL one can specify that a relation has a cardinality. So for example the mortal:parent relation with domain and range of mortal:Human, would usually be defined as having cardinality 2. So whenever we have something that is a human, we can deduce that it has two parents, even if we don't know who they are.
Note that this is not expressible in a DTD or even in RelaxNG . The best one could say is that a <Person> element could have 0 to 2 <parent> elements. So one could have something like this
<Person> <name>Henry Story</name> <parent><Person>...</Person></parent> <parent><Person>...</Person></parent> </Person>One can not say that it must have 2 elements, without thereby specifying documents that are necessarily infinitely long, since each parent is a Person, and so would have to itself have a parent element, etc. etc... With XML we are stuck to the level of syntax.
Working at the level of syntax does have some advantage: it is obvious how to query for the existence of information. One just searches the document using an XQuery. So say I have an XML passport, and I want to find out if it contains a number, it would be easy to find out by searching for the passportNumber attribute for example. The disadvantage is that the query will only succeed on certain types of xml documents, namely those that put the information in that spot. It won't work with information written in other xml passport formats or real paper passports.
Now how does one specify that a passport has a number printed in it? We don't want to say that a doc:Passport has a relation doc:passportNumber with cardinality of 1. Though that seems correct, it would fail to help us find invalid passports that did not have a number printed on them, since
- a OWL reasoner would add the relation to a blank node anyway by following the suggested owl cardinality rule.
- there could be a statement as to the passport number written down somewhere else completely, which might have been meshed with the information about the passport. A passport with a passport number written on a separate piece of paper won't help you cross the border...
- The passport might have had a passport number in it until I cut that information out of the passport, or it got erased in some way by a mischievous border guard. The government databases would still attribute a passport number to my passport. So as soon as I asked them what it is, I would end up having correct knowledge of my passport, yet my passport still be invalid.
Here is the solution presented by Tim Berners Lee:
OWL cardinality might say that a person must have at least one passport number, but it can NOT say that a document about a person contains information about at least one passport number.
N3 rules can, with log:semantics (which related a document to the graph you get from parsing it) and log:includes, and nested graphs:
@forAll x, p, g1. { x a Person; passport p. p log:semantics g1. g1 log:notIncludes { x passportNumber [] } } => { ?p a InvalidPassport }.On the semantic web, as anyone can in principle say anything about anything, you can never make statements about how many passportNumber statements there are without specifying the document in question.
A passport is quite clearly both a document and an object. As an object it can have properties such as a being in your pocket. As a document it tells us something about the world, among other things information about the owner of the passport. There are many source of information in the world. If one wants to find out what possible worlds a particular source of information describes, one has to limit one's query to that source of information.
Note: Semantics of Graphs
Following David Lewis I like to think of a graph as a set of possible worlds which satisfy the patterns of the graph. Tim Berners Lee's formula is saying: find me the set of possible worlds that correctly interpret the passport I have. If this set includes worlds where I don't have a passport number, then my passport is invalid. That is because I can only have such worlds in my interpretation of my passport if I don't have the number on my passport.This interpretation of graphs must be a little too strong semantically, as it leads to the following problems:
How does one query for documents about mathematical truths? If a document says that "2+2=4" it will be true in all possible worlds, just as the document that says "1+1=2", and so querying for the one will be right if I query for the other. Perhaps here one has to query literally, namely for the string or regex "2+2=4".
Posted at 09:15PM Jun 15, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[3]
Note on comments:
- I know the forms below are a little small. We have asked for years for this to be changed, but I don't think it's going to happen soon. In Apple's Safari you can resize the entry box with you mouse. For people using other browsers click on this javascript link, that should allow you to resize your form.
- Comments are moderated, so they will take a little time to appear. Currently moderation means I have to read them personally. Hopefully with OpenId deployment, this will become more automated.
- HTML markup no longer works here, due to some decision made somewhere. Sorry about that.
- If you are having trouble posting, it may be that you need javascript to be enabled. I don't think javascript should be needed for submitting a form, but that's the way it is here.
- Check your comments by using the preview button...

Posted by Taylor on June 15, 2007 at 10:42 PM CEST #
Posted by Henry Story on June 16, 2007 at 05:17 PM CEST #
You say: Tim Berners Lee's formula is saying: find me the set of possible worlds that correctly interpret the passport I have. If this set includes worlds where I don't have a passport number, then my passport is invalid.
Since log:notIncludes is a builtin in the CWM - the closed world machine, its probably better (and simpler) paraphrased as: Find me the documents that do not include (and for which cannot be inferred) a number for my passport...
And yes, it is sometimes said that cwm isn't really based on the closed world assumption, but as I understand it, this is because it can load additional data from the web and that's really a question not related to the point here
Posted by Valentin on June 17, 2007 at 02:54 PM CEST #