free web site hit counter

The Sun BabelFish Blog

Don't panic !

Monday Apr 21, 2008

FOAF & SSL: creating a global decentralised authentication protocol

Following on my previous post RDFAuth: sketch of a buzzword compliant authentication protocol, Toby Inkster came up with a brilliantly simple scheme that builds very neatly on top of the Secure Sockets Layer of https. I describe the protocol shortly here, and will describe an implementation of it in my next post.

Simple global ( passwordless if using a device such as the Aladdin USB e-Token ) authentication around the web would be extremely valuable. I am currently crumbling under the number of sites asking me for authentication information, and for each site I need to remember a new id and password combination. I am not the only one with this problem as the data portability video demonstrates. OpenId solves the problem but the protocol consumes a lot of ssl connections. For hyperdata user agents this could be painfully slow. This is because they may need access to just a couple of resources per server as they jump from service to service.

As before we have a very simple scenario to consider. Romeo wants to find out where Juliette is. Juliette's hyperdata Address Book updates her location on a regular basis by PUTing information to a protected resource which she only wants her friends and their friends to have access to. Her server knows from her foaf:PersonalProfileDocument who her friends are. She identifies them via dereferenceable URLs, as I do, which themselves usually (the web is flexible) return more foaf:PersonalProfileDocuments describing them, and pointing to further such documents. In this way the list of people able to find out her location can be specified in a flexible and distributed manner. So let us imagine that Romeo is a friend of a friend of Juliette's and he wishes to talk to her. The following sequence diagram continues the story...

sequence diagram of RDF+SSL

The stages of the diagram are listed below:

  1. First Romeo's User Agent HTTP GETs Juliette's public foaf file located at http://juliette.net/. The server returns a representation ( in RDFa perhaps ) with the same semantics as the following N3:

    @prefix : <#> . 
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix todo: <http://eg.org/todo#> .
    @prefix openid: <http://eg.org/openid/todo#> .
    
    <> a foaf:PersonalProfileDocument;
       foaf:primaryTopic :juliette ;
       openid:server <https://aol.com/openid/service>; # see The Openid Sequence Diagram .
    
    :juliette a foaf:Person;
       foaf:name "Juliette";
       foaf:openid <>;
       foaf:blog </blog>;    
       rdfs:seeAlso <https://juliette.net/protected/location>; 
       foaf:knows <http://bblfish.net/people/henry/card#me>,
                  <http://www.w3.org/People/Berners-Lee/card#i> .
    
    <https://juliette.net/protected/location> a todo:LocationDocument .
    

    Romeo's user agent receives this representation and decides to follow the https protected resource because it is a todo:LocationDocument.

  2. The todo:LocationDocument is at an https URL, so Romeo's User Agent connects to it via a secure socket. Juliette's server, who wishes to know the identity of the requestor, sends out a Certificate Request, to which Romeo's user agent responds with an X.509 certificate. This is all part of the SSL protocol.

    In the communication in stage 2, Romeo's user agent also passes along his foaf id. This can be done either by:

    • Sending in the HTTP header of the request an Agent-Id header pointing to the foaf Id of the user. Like this:
      Agent-Id: http://romeo.net/#romeo
      
      This would be similar to the current From: header, but instead of requiring an email address, a direct name of the agent would be required. (An email address is only an indirect identifier of an agent).
    • The Certificate could itself contain the Foaf ID of the Agent in the X509v3 extensions section:
              X509v3 extensions:
                 ...
                 X509v3 Subject Alternative Name: 
                                 URI:http://romeo.net/#romeo
      

      I am not sure if it would be correct use of the X509 Alternative names field. So this would require more standardization work with the X509 community. But it shows a way where the two communities could meet. The advantage of having the id as part of the certificate is that this could add extra weight to the id, depending on the trust one gives the Certificate Authority that signed the Certificate.

  3. At this point Juliette's web server knows of the requestor (Romeo in this case):
    • his alleged foaf Id
    • his Certificate ( verified during the ssl session )

    If the Certificate is signed by a CA that Juliette trusts and the foaf id is part of the certificate, then she will trust that the owner of the User Agent is the entity named by that id. She can then jump straight to step 6 if she knows enough about Romeo that she trusts him.

    Having Certificates signed by CA's is expensive though. The protocol described here will work just as well with self signed certificates, which are easy to generate.

  4. Juliette's hyperdata server then GETs the foaf document associated with the foaf id, namely <http://romeo.net/> . Romeo's foaf server returns a document containing a graph of relations similar to the graph described by the following N3:
    @prefix : <#> . 
    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix wot: <http://xmlns.com/wot/0.1/> .
    @prefix wotodo: <http://eg.org/todo#> .
    
    <> a foaf:PersonalProfileDocument;
        foaf:primaryTopic :romeo .
    
    :romeo a foaf:Person;
        foaf:name "Romeo";
        is wot:identity of [ a wotodo:X509Certificate;
                             wotodo:dsaWithSha1Sig """30:2c:02:14:78:69:1e:4f:7d:37:36:a5:8f:37:30:58:18:5a:
                                                 f6:10:e9:13:a4:ec:02:14:03:93:42:3b:c0:d4:33:63:ae:2f:
                                                 eb:8c:11:08:1c:aa:93:7d:71:01""" ;
                           ] ;
        foaf:knows <http://bblfish.net/people/henry/card#me> .
    
  5. By querying the semantics of the returned document with a SPARQL query such as
    PREFIX wot: <http://xmlns.com/wot/0.1/> 
    PREFIX wotodo: <http://eg.org/todo#> 
    
    SELECT { ?sig }
    WHERE {
        [] a wotodo:X509Certificate;
          wotodo:signature ?sig;
          wot:identity <http://romeo.net/#romeo> .
    }
    

    Juliette's web server can discover the certificate signature and compare it with the one sent by Romeo's user agent. If the two are identical, then Juliette's server knows that the User Agent who has access to the private key of the certificate sent to it, and who claims to be the person identified by the URI http://romeo.net/#romeo, is in agreement as to the identity of the certificate with the person who has write access to the foaf file http://romeo.net/. So by proving that it has access to the private key of the certificate sent to the server, the User Agent has also proven that it is the person described by the foaf file.

  6. Finally, now that Juliette's server knows an identity of the User Agent making the request on the protected resource, it can decide whether or not to return the representation. In this case we can imagine that my foaf file says that
     @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    
     <http://bblfish.net/people/henry/card#me> foaf:knows <http://romeo.net/#romeo> .  
     
    As a result of the policy of allowing all friends of Juliette's friends to be able to read the location document, the server sends out a document containing relations such as the following:
    @prefix contact: <http://www.w3.org/2000/10/swap/pim/contact#> .
    @prefix : <http://juliette.org/#> .
    
    :juliette 
        contact:location [ 
              contact:address [ contact:city "Paris";
                                contact:country "France";
                                contact:street "1 Champs Elysees" ]
                         ] .
    

Todo

  • Create an ontology for X509 certificates.
  • test this. Currently there is some implementation work going on in the so(m)mer repository in the misc/FoafServer directory.
  • Can one use the Subject Alternative name of an X509 certificate as described here?
  • For self signed certificates, what should the X509 Distinguished Name (DN) be? The DN is really being replaced here by the foaf id, since that is where the key information about the user is going to be located. Can one ignore the DN in a X509 cert, as one can in RDF with blank nodes? One could I imagine create a dummy DN where one of the elements is the foaf id. These would at least, as opposed to DN, be guaranteed to be unique.
  • what standardization work would be needed to make this

Discussion on the Web

Friday Apr 18, 2008

The OpenId Sequence Diagram

OpenId very neatly solves the global identity problem within the constraints of working with legacy browsers. It is a complex protocol though as the following sequence diagram illustrates, and this may be a problem for automated agents that need to jump around the web from hyperlink to hyperlink, as hyperdata agents tend to do.

The diagram illustrates the following scenario. Romeo wants to find the current location of Juliette. So his semantic web user agent GET's her current foaf file. But Juliette wants to protect information about her current whereabouts and reveal it only to people she trusts, so she configures her server to require the user agent to authenticate itself in order to get more information. If the user agent can prove that is is owned by one of her trusted friends, and Romeo in particular, she will deliver the information to it (and so to him).

The steps numbered in the sequence diagram are as follows:

  1. A User Agent fetches a web page that requires authentication. OpenId was designed with legacy web browsers in mind, for which it would return a page containing an OpenId login box such as the one to the right. openid login box In the case of a hyperdata agent as in our use case, the agent would GET a public foaf file, which might contain a link to an OpenId authentication endpoint. Perhaps with some rdf such as the following N3:
    <> openid:login </openidAuth.cgi> .
    
    Perhaps some more information would indicate which resources were protected.
  2. In current practice a human user notices the login box and types his identifying URL in it, such as http://openid.sun.com/bblfish This is the brilliant invention of OpenId: getting hundreds of millions of people to find it natural to identify themselves via a URL, instead of an email. The user then clicks the "Login button".
    In our semantic use case the hyperdata agent would notice the above openid link and would deduce that it needs to login to the site to get more information. Romeo's Id ( http://romeo.net/ perhaps ) would then be POSTed to the /openidAuth.cgi authentication endpoint.
  3. The OpenId authentication endpoint then fetches the web page by GETing Romeo's url http://romeo.net/. This returned representation contains a link in the header of the page pointing Romeo's OpenId server url. If the representation returned is html then this would contain the following in the header
     <link rel="openid.server" href="https://openid.sun.com/openid/service" />
    
  4. The representation returned in step 3, could contain a lot of other information too. A link to a foaf file may not be a bad idea as I described in foaf and openid. The returned representation in step 3 could even be RDFa extended html, in which case this step may not even be necessary. For a hyperdata server the information may be useful, as it may suggest a connection Romeo could have to some other people that would allow it to decide whether it wishes to continue the login process.
  5. Juliette's OpenId authentication endpoint then sends a redirect to Romeo's user agent, directing it towards his OpenId Identity Provider. The redirect also contains the URL of the OpenId authentication cgi, so that in step 8 below the Identity Provider can redirect a message back.
  6. Romeo user agent dutifully redirects romeo to the identity provider, which then returns a form with a username and password entry box.
  7. Romeo's user agent could learn to fill the user name password pair in automatically and even skip the previous step 6 . In any case given the user name and password, the Identity Provider then sends back some cryptographic tokens to the User Agent to have it redirect to the OpenId Authentication cgi at http://juliette.net/openidAuth.cgi.
  8. Romeo's Hyperdata user agent then dutifully redirects back to the OpenId authentication endpoint
  9. The authentication endpoint sends a request to the Openid Identity provider to verify that the cryptographic token is authentic. If it is, a conventional answer is sent back.
  10. The OpenId authentication endpoint finally sends a response back with a session cookie, giving access to various resources on Juliette's web site. Perhaps it even knows to redirect the user agent to a protected resource, though that would have required some information concerning this to have been sent in stage 2.
  11. Finally Romeo's user agent can GET Juliette's protected information if Juliette's hyperdata web server permits it. In this case it will, because Juliette loves Romeo.

All of the steps above could be automatized, so from the user's point of view they may not be complicated. The user agent could even learn to fill in the user name and password required by the Identity Provider. But there are still a very large number of connections between the User Agent and the different services. If these connections are to be secure they would need to protected by SSL (as hinted at by the double line arrows). And SSL connections are not cheap. So the above may be unacceptably slow. On the other hand it would work with a protocol that is growing fast in acceptance.

It is is certainly worth comparing this sequence diagram with the very light weight one presented in "FOAF & SLL: creating a global decentralised authentication protocol".

Thanks again to Benjamin Nowack for bringing the discussion on RDFAuth to thinking about using the OpenId protocol directly as described above. See his post on the semantic web mailing list. Benjamin also pointed to the HTTP OpenID Authentication proposal, which shows how some of the above can be simplified if certain assumptions about the capabilities of the client are made. It would be worth making a sequence diagram of that proposal too.

Friday Mar 28, 2008

RDFAuth: sketch of a buzzword compliant authentication protocol

Here is a proposal for an authentication scheme that is even simpler than OpenId ( see sequence diagram ), more secure, more RESTful, with fewer points of failure and fewer points of control, that is needed in order to make Open Distributed Social Networks with privacy controls possible.

Update

The following sketch led to the even simpler protocol described in Foaf and SSL creating a global decentralized authentication protocol. It is very close to what is proposed here but builds very closely on SSL, so as to reduce what is new down to nearly nothing.

Background

Ok, so now I have your attention, I would like to first mention that I am a great fan of OpenId. I have blogged about it numerous times and enthusiastically in this space. I came across the idea I will develop below, not because I thought OpenId needed improving, but because I have chosen to follow some very strict architectural guidelines: it had to satisfy RESTful, Resource oriented hyperdata constraints. With the Beatnik Address Book I have proven - to myself at least - that the creation of an Open Distributed Social Network (a hot topic at the moment, see the Economist's recent article on Online social network) is feasible and easy to do. What was missing is a way for people to keep some privacy, clearly a big selling point for the large Social Network Providers such as Facebook. So I went on the search of a solution to create a Open Distributed Social Network with privacy controls. And initially I had thought of using OpenId.

OpenId Limitations

But OpenId has a few problems:

  • First it is really designed to work with the limitations of current web browsers. It is partly because of this that there is a lot of hopping around from the service to the Identity Provider with HTTP redirects. As the Tabulator, Knowee or Beatnik.
  • Parts of OpenId 2, and especially the Attribute Exchange spec really don't feel very RESTful. There is a method for PUTing new property values in a database and a way to remove them that does not use either the HTTP PUT method or the DELETE method.
  • The OpenId Attribute Exchange is nice but not very flexible. It can keep some basic information about a person, but it does not make use of hyperdata. And the way it is set up, it would only be able to do so with great difficulty. A RESTfully published foaf file can give the same information, is a lot more flexible and extensible, whilst also making use of Linked Data, and as it happens also solves the Social Network Data Silo problems. Just that!
  • OpenId requires an Identity Server. There are a couple of problems with this:
    • This server provides a Dynamic service but not a RESTful one. Ie. the representations sent back and forth to it, cannot be cached.
    • The service is a control point. Anyone owning such a service will know which sites you authenticate onto. True, you can set up your own service, but that is clearly not what is happening. The big players are offering their customers OpenIds tied to particular authentication servers, and that is what most people will accept.
As I found out by developing what I am here calling RDFAuth, for want of a better name, none of these restrictions are necessary.

RDFAuth, a sketch

So following my strict architectural guidelines, I came across what I am just calling RDFAuth, but like everything else here this is a sketch and open to change. I am not a security specialist nor an HTTP specialist. I am like someone who comes to an architect in order to build a house on some land he has, with some sketch of what he would like the house to look like, some ideas of what functionality he needs and what the price he is willing to pay is. What I want here is something very simple, that can be made to work with a few perl scripts.

Let me first present the actors and the resources they wish to act upon.

  • Romeo has a Semantic Web Address Book, his User Agent (UA). He is looking for the whereabouts of Juliette.
  • Juliette has a URL identifier ( as I do ) which returns a public foaf representation and links to a protected resource.
  • The protected resource contains information she only wants some people to know, in this instance Romeo. It contains information as to her current whereabouts.
  • Romeo also has a public foaf file. He may have a protected one too, but it does not make an entrance in this scene of the play. His public foaf file links to a public PGP key. I described how that is done in Cryptographic Web of Trust.
  • Romeo's Public key is RESTfully stored on a server somewhere, accessible by URL.

So Romeo wants to find out where Juliette is, but Juliette only wants to reveal this to Romeo. Juliette has told her server to only allow Romeo, identified by his URL, to view the site. She could have also have had a more open policy, allowing any of her or Romeo's friends to have access to this site, as specified by their foaf file. The server could then crawl their respective foaf files at regular intervals to see if it needed to add anyone to the list of people having access to the site. This is what the DIG group did in conjunction with OpenId. Juliette could also have a policy that decides Just In Time, as the person presents herself, whether or not to grant them access. She could use the information in that person's foaf file and relating it to some trust metric to make her decision. How Juliette specifies who gets access to the protected resource here is not part of this protocol. This is completely up to Juliette and the policies she chooses her agent to follow.

So here is the sketch of the sequence of requests and responses.

  1. First Romeo's user Agent knows that Juliette's foaf name is http://juliette.org/#juliette so it sends an HTTP GET request to Juliette's foaf file located of course at http://juliette.org/
    The server responds with a public foaf file containing a link to the protected resource perhaps with the N3
      <> rdfs:seeAlso <protected/juliette> .
    
    Perhaps this could also contain some relations describing that resource as protected, which groups may access it, etc... but that is not necessary.
  2. Romeo's User Agent then decides it wants to check out protected/juliette. It sends a GET request to that resource but this time receives a variation of the Basic Authentication Scheme, perhaps something like:
    HTTP/1.0 401 UNAUTHORIZED
    Server: Knowee/0.4
    Date: Sat, 1 Apr 2008 10:18:15 GMT
    WWW-Authenticate: RdfAuth realm="http://juliette.org/protected/*" nonce="ILoveYouToo"
    
    The idea is that Juliette's server returns a nonce (in order to avoid replay attacks), and a realm over which this protection will be valid. But I am really making this up here. Better ideas are welcome.
  3. Romeo's web agent then encrypts some string (the realm?) and the nonce with Romeo's private key. Only an agent trusted by Romeo can do this.
  4. The User Agent then sends a new GET request with the encrypted string, and his identifier, perhaps something like this
    GET /protected/juliette HTTP/1.0
    Host: juliette.org
    Authorization: RdfAuth id="http://romeo.name/#romeo" key="THE_REALM_AND_NONCE_ENCRYPTED"
    Content-Type: application/rdf+xml, text/rdf+n3
    
    Since we need an identifier, why not just use Romeos' foaf name? It happens to also point to his foaf file. All the better.
  5. Because Juliette's web server can then use Romeo's foaf name to GET his public foaf file, which contains a link to his public key, as explained in "Cryptographic Web of Trust".
  6. Juliette's web server can then query the returned representation, perhaps meshed with some other information in its database, with something equivalent to the following SPARQL query
    PREFIX wot: <http://xmlns.com/wot/0.1/>
    SELECT ?pgp
    WHERE {
         [] wot:identity <http://romeo.name/#romeo>;
            wot:pubkeyAddress ?pgp .
    } 
    
    The nice thing about working at the semantic layer, is that it decouples the spec a lot from the representation returned. Of course as usage grows those representations that are understood by the most servers will create a de facto convention. Intially I suggest using RDF/XML of course. But it could just as well be N3, RDFa, perhaps even some microformat dialect, or even some GRDDLable XML, as the POWDER working group is proposing to do.
  7. Having found the URL of the PGP key, Juliette's server, can GET it - and as with much else in this protocol cache it for future use.
  8. Having the PGP key, Juliette's server can now decrypt the encrypted string sent to her by Romeo's User Agent. If the decrypted string matches the expected string, Juliette will know that the User Agent has access to Romeo's private key. So she decides this is enough to trust it.
  9. As a result Juliette's server returns the protected representation.
Now Romeo's User Agent knows where Juliette is, displays it, and Romeo rushes off to see her.

Advantages

It should be clear from the sketch what the numerous advantages of this system are over OpenId. (I can't speak of other authentication services as I am not a security expert).

  • The User Agent has no redirects to follow. In the above example it needs to request one resource http://juliette.org/ twice (2 and 4) but that may only be necessary the first time it accesses this resource. The second time the UA can immediately jump to step 3. [but see problem with replay attacks raised in the comments by Ed Davies, and my reply] Furthermore it may be possible - this is a question to HTTP specialists - to merge step 1 and 2. Would it be possible for a request 1. to return a 20x code with the public representation, plus a WWWAuthenticate header, suggesting that the UA can get a more detailed representation of the same resource if authenticated? In any case the redirect rigmarole of OpenId, which is really there to overcome the limitations of current web browsers, in not needed.
  • There is no need for an Attribute Exchange type service. Foaf deals with that in a clear and extensible RESTful manner. This simplifies the spec dramatically.
  • There is no need for an identity server, so one less point of failure, and one less point of control in the system. The public key plays that role in a clean and simple manner
  • The whole protocol is RESTful. This means that all representations can be cached, meaning that steps 5 and 7 need only occur once per individual.
  • As RDF is built for extensibility, and we are being architecturally very clean, the system should be able to grow cleanly.

Contributions

I have been quietly exploring these ideas on the foaf and semantic web mailing lists, where I received a lot of excellent suggestions and feedback.

Finally

So I suppose I am now looking for feedback from a wider community. PGP experts, security experts, REST and HTTP experts, semantic web and linked data experts, only you can help this get somewhere. I will never have the time to learn these fields in enough detail by myself. In any case all this is absolutely obviously simple, and so completely unpatentable :-)

Thanks for taking the time to read this

Tuesday Aug 28, 2007

My Bloomin' Friends

Closed Social Networks are blossoming all over the place. They provide a semblance of protection, at a price: lock in. Locked into the social network provider you get convenience in the form of tools to make conversation easier (video, email, chat boards, ...), some form of privacy protection (if you trust the provider), introductions to 'like minded' people, and other niceties.

Some of us work in the open air: we have to set standards in public view; we stand by what we say; we accept criticism from wherever it comes; and we can't choose our friends based on their social network provider. We describe ourselves in our foaf files where we can specify what we do, how to contact us, our interests, and links to who we know by pointing to their Universal Identifiers. There is no trouble linking between people who are open in this way. We are happy to reference each other: it strenghtens the exposure of our work and the quality of the web. This is how I link to Paul Gearon:

:me foaf:knows  [ = <http://web.mac.com/thegearons/people/PaulGearon/foaf.rdf#me>; 
                  a foaf:Person;
                  foaf:name "Paul Gearon" ] .
I could just point to his URL, but the little extra duplicate information can make life easier for people/robots browsing the data web. It can help people notice inconcistencies and help me correct them.

But not everyone lives in the open the same way, and not everyone wants to make the same amount of information about themselves public. There are a number of different ways to deal with this. I want to discuss a few of them here.

Content Negotition

How much someone says about themselves is up to them, and so is how they protect their information. The same URL that identifies someone, could return more or less information depending on who is asking. I could set up my foaf file so only friends who log in via openid can see my friends. Others would just get default information about me. I could be even more clever. I could allow any friend of my friend who logs in via their openid to see my full foaf file; others would see information about me, and a select group of open friends. Closed Social Networks could open up by making it convenient to specify these policies, and providing the right infrastructure to do so.

Indirect Identification

By directly identifying someone via a URL (as I do) we can leave a lot of the policy of what they make visible up to them. But those that don't have a foaf name, need to be identified indirectly. We can do that by identifying them via some property such as their blog, their home page, their email address, or their openid. I am very open about my email addresses. They are published and visible to all.
 <http://bblfish.net/people/henry/card#me>     <http://xmlns.com/foaf/0.1/mbox> <mailto:henry.story@bblfish.net> .
I value it more that people can contact me easily - living as I do in the middle of nowhere and often living nowhere in particular - than the pain of spammers. Too many people are lazy about security, using virus filled Windoze computers, obvious passwords, cracked software for me to be under any illusion that hiding my email is going to prevent the bad guys from getting it.

However I can't assume that everyone else will accept me applying this argument to their email address. For this there is a nice mathematical technique: I can encrypt their email address using the SHA1 hash function. This create a close to unique string that cannot be dissasembled. You cannot go from the sha1 sum of an email address back to the email. But you can always calculate the same sha1sum from an email. This is how I identify Simon Phipps, Sun's Open source Officer:

:me foaf:knows [ a foaf:Person;
                 foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                 foaf:name "Simon Phipps";
                 rdfs:seeAlso <http://www.webmink.net/foaf.rdf>
               ].

If you know Simon's email, then you will know that I know him. "What use is that?" I can hear someone ask. It's all about Working with People on the Internet. Imagine you are reading email on a newsgroup with a foaf enabled mail tool linked to a foaf enabled Address Book (such as Beatnik). You come on an email by Simon saying something interesting about how Sun has changed its stock ticker to JAVA for example. My logo and perhaps that of a couple of other people appears on the mail reader in a way that indicates to you that we know Simon. The post is no longer anonymous for you, and so has more trust value. You feel part of a community.[1]

So spammers can not use that information to spam. Either they already know your email address, and so they are probably already spamming you, or they don't, and this won't help them. They can only [2] learn about social network claims: who claims to know who. They could use this, it is true to introduce themselves as an aquaintance of a friend of yours. A bit of a risky strategy that could quickly get them on a black list. Currently being black listed may not be an expensive proposition. But in a cryptographic web of trust this will be both much easier to notice, and more damaging for the infringers.

Fuzzy Identification

I can directly and indirectly identify a lot of people in my Address Book as described above. This is perfectly acceptable for people who have an open life, like I do, and a large portion of the Open Source community, bloggers, standard setters, etc... But on last count I had over 700 people in my AddressBook. It is a lot of work to identify all of them individuall, and to decide how much visibility I should give them. I may not even want people to know how many people I know this way. Also I may want deniability: there are people one may know, but one may not want to highlight that, and one may want to be able to deny that one knows them to some people. The foaf:sha1sum gives me a way to identify someone, but if some nozy person comes to me and asks me about that person's life after having identified the corresponding email address, there is no escape route other than refusing any conversation, which by itself can easily be taken to be significant. What we need is a way to fuzily identify a group of one's aquaintance.

Bloom Filters

This is what Bloom Filters enable one to do. Originally used in times when memory was expensive, they allowed the whole vocabulary of a language to be condensed into a reasonably short string. Here we can use it to group all the email addresses of our friends together in one opaque string. I could express as follows in RDF (bear in mind that the rdf vocabulary has not been settled on):
:me foaf:bloomMbox [ a bloom:Bloom;
                     bloom:base64String """"
            IAOgQgSAAAICCAADAoQgDABAAiQKgIABgyAIBEhAAAAIUKBACCYAABAAaEkGQAGIEAHRUAgAAQUw 
            hCgwACJNQxQAAggAgCIgAAAAKgICEKAAAABCQiB0JCAAAIkgDASAYiAAAEIQAAIAABDCEAZACOpA 
            ICEEMAGAEGEAxIA=""";
                     bloom:hashNum 4;
                     bloom:length 1000 ] .

Given the above Bloom someone can query it with an email address using the inverse algorithm and the Bloom will answer either that I may know that person, or that it can't tell. The loaf project explains some of the advantages of having this in more detail.

The best way to get a feel for how it works is to try it. Here I have written a little java applet [3] that allows you to test my Bloom for people I know, and to create your own bloom [4].

Your browser is completely ignoring the <APPLET> tag! Go to java.com to download the latest.

Some emails you can try with positive results are tbray attextuality dot C O M, or bill at dehora dot net (suitably transformed of course). The applet lowercases all email addresses when creating and when testing the bloom.

To create your own bloom just click the "Create Bloom" tab. An easy way to extract all your email addresses from an OSX Address Book is to run the following on the command line:

hjs@bblfish:0$ osascript -e 'tell application "Address Book" to get the value of every email of every person' | perl -pe 's/,+ /\n/g' | sort | uniq | pbcopy

You should now be able to paste the list of all your contacts in the applet. To restrict the Addresses to on of your groups named "foaf" for example replace the relevant section above with tell application "Address Book" to get the value of every email of every person in foaf.

You will need to choose the number of hashes and the maximal size of the bucket you wish to fill. The greater the number of hashes and the greater the size of the bucket, the more precision you get and the less deniability.[5]

Conclusion

None of the above tools are by themselves the complete solution for creating an Open Social Network that will satisfy everyone. But for people willing to live in the open, the correct and astute use of them should satisfy most of people's requirements. Access Control on URLs can make it possible to reveal more or less information depending on who is looking; indirect identification can allow one to name people even without direct identification; sha1sums allows one to partially hide sensitve identifying information; and Blooms allow one to make fuzzy statements of set membership. All of these can be combined in different ways. So one can make statements about sha1sum identified people on the open web, or one can do so behind an access controlled file that only friends logged in with OpenId can see. There are bound to be more fun things to be discovered here. But this should make clear just how much can be done in this space.

Notes

  1. For the link from email addresses to sha1sums to work, it helps to canonicalise the emails to all lowercase. This should probably be made more explict in the foaf:mbox_sha1sum definition.
  2. "They can 'only' learn about social network claims", is quite a lot more than some people are willing to accept. See the article by Mark Wahl "Organizing principles for identity systems: Attacks on anonymized social networks and fudging oracles" which contains some very good pointers. For people who want to retain complete anonymity, and this is what people subscribe to when they answer public surveys, any leakage of information is too much leakage. The problem is that because of Metcalf's Law it is nearly impossible to stop information combining itself: Information wants to be linked. So I think, when we are not tied to stringent laws, we should accept this rather than fight it, and use it to our advantage when hunting down spammers: the law holds for them too.
  3. You can get the source code for the applet on the so(m)mer repository in the misc/Bloom subdirectory. I used the pt.tumba.spell.BloomFilter class which I adapted a little for my needs. This was just the first one I found out there. It is probably not the most efficient one, as it uses an array of booleans, when it could use an byte array. If you know of other libraries please let me know.
    The code was put together really quickly and may well contain bugs. Feedback and patches and contributions are welcome.
  4. the advantage of Java Applets over server side code is really obvious here:
    • I don't need a server with a fixed port number to show you this
    • someone can't easily start a denial of service attack to bring the server down
    • You email addresses never leave your computer, so there is no fear of loss of privacy.
    On the last point it would be nice if browser vendors made it easier to get info about the exact restrictions a Java Applet had. I would like to be able to click on an Applet and verify or set it to "no network communication whatever". This would increase trust even more in cases like this.
  5. More info on the load site. Apparently one needs more than 1/4 deniability if one is to preserve some measure of privacy, according to the paper "the price of privacy and the limits of LP decoding" by Cynthia Dwork, Frank McSherry and Kunal Talwar (Microsoft Research) who suggests that
    ... any privacy mechanism, interactive or non-interactive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.
    Thanks again to Mark Wahl for these references.
  6. Thanks a lot to Dan Brickley for working together with me on this last Friday, and pointing me to many of the important work done here. Dan also wrote a little python script to do something similar. Some of the sites I came across during our discussion: Not having studied bloom filters in detail, I am not sure how compatible the blooms of each of these libraries are. The super simple ruby bloom library does not seem to specify the number of hashes that were used to create a Bloom.
  7. Nick Lothian reminded me in a comment to this that he has written a Bloom Filter demo for facebook. I don't have a facebook account (because I am already on LinkedIn, and I can't really be bothered to move all my information, and because I don't like closed networks), so I was not able to use it. Perhaps I should get a facebook account just for this... Let me know.

Thursday Aug 09, 2007

cryptographic web of trust

As our identity moves more and more onto the Web our ability to have people trust that what we write has not been altered is becoming increasingly important. As our home page becomes our OpenId, linking to our CV, blogs and foaf information, it will become more important for services to be able to trust that the information they see really is what I wrote. Otherwise the following scenario can become all to easy to imagine. Someone breaks into my web server and changes the OpenId link on my home page to point to a server they control. They then go and post comments around the web using my openid as identity. Their server of course always authenticates them. People who receive the posts then click on the OpenId URL which happens to be my home page, read information about me, and seeing that information trust that the comment really came from me.

What is needed is some way to increase the methods people can have to trust information I state. Here I describe how one can use cryptography to increase that trust level. Starting from my creation of a PGP key, I show how I can describe my public signature in my foaf file, used PGP to sign that file and then link to the signature so that people can detect a tampered file. From this basis I show how one can build a very solid cryptographically enhanced web of trust.

Creating your PGP public key

After reading the first part of "SafariBooks Online, and feeling comfortable that I understood the basics, I decided it was high time for me to create myself a Public PGP key. Reading the GPG Manual and a few other HOWTOs on the web, using the gnu GPG library, I managed to make myself one quite easily.

Linking to the Public Key from your foaf file

Having this it was just a matter of placing it on my web site and using the Web Of Trust ontology developed by Dan Brickley to point to it and describe it, with the following triples:

@prefix wot: <http://xmlns.com/wot/0.1/> .
@prefix : <http://bblfish.net/people/henry/card#> .

:me  is wot:identity of [ a wot:PubKey;
                        wot:pubkeyAddress <http://bblfish.net/people/henry/henry.pubkey.asc>;
                        wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7";
                        wot:hex_id "4CAE10D7";
                        wot:length 1024 ] .

The is wot:identity of construct is a nice N3 short hand for refering to the inverse relation of wot:identity, without having to name one. This states that my public key can be found at http://bblfish.net/people/henry/henry.pubkey.asc, and describe its fingerprint, key length and its hex id. I am not sure why the wot:PubKey resource has to be a blank node, and can't be the URL of the public key itself, which would make for the following N3:

:me  is wot:identity of [ = < http://bblfish.net/people/henry/henry.pubkey.asc>
                          a wot:PubKey;
                          wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7" ] .

Perhaps simply because it is quite likely that one would want to put copies of one's public key in dfiferent places? owl:sameAs could have done the trick there too though...

Signing your foaf file

Anyway, once that is done, I want to be able to sign my foaf file. Of course it would be pretty tricky to sign the foaf file and put the signature into the foaf file simultaneously, as that would change the content of the foaf file and make the signature invalid. So the easiest solution is to simply have the foaf file point to the signature with something like this [1]

<> wot:assurance <card.asc>

The problem with this solution is that my foaf file at http://bblfish.net/people/henry/card currently returns two different representations: a rdf/xml one and an N3 one depending one how it is called. (More on this in I have a Web 2.0 name!). Now a signature is valid only for a sequence of bits, and the rdf sequence of bits is different from the N3 sequence, so they can't both have the same sig. There is a complicated solution developed by Jeremy Carroll in his paper Signing RDF Graphs, which proposes to sign a canonicalised RDF graph. The problem is that the algorithm to create such a graph does take some time to compute, only works on a subset of all RDF graphs, but mostly that no software currently implements that algorithm.
Luckily there is a simple solution, which I got by inspiration from my work on creating an ontology for Atom, Atom OWL, and that is to link my card explicity to their alternate representation (the <link type="alternate" href="...">) [2] and to sign those alternate representations. This gives me the following triples:

@prefix iana: <http://www.iana.org/assignments/relation/> .
@prefix awol: <http://bblfish.net/work/atom-owl/2006-06-06/#> .

<http://bblfish.net/people/henry/card>   a foaf:PersonalProfileDocument;
     iana:alternate <http://bblfish.net/people/henry/card.rdf>,
                    <http://bblfish.net/people/henry/card.n3> .

<http://bblfish.net/people/henry/card.rdf>
       wot:assurance <http://bblfish.net/people/henry/asc.card.rdf.asc> ;
       awol:type "application/rdf+xml" .
<http://bblfish.net/people/henry/card.n3>
       wot:assurance &lhttp://bblfish.net/people/henry/asc.card.n3.asc> ;
       awol:type "text/rdf+n3" .

Here I am saying that my card has two alternate representations, an rdf and an n3 one, what their mime type is, and where I can find the signature for each. Simple.

So if I put all of the above together I get the following extract from my N3 file:

@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix wot: <http://xmlns.com/wot/0.1/> .
@prefix awol: <http://bblfish.net/work/atom-owl/2006-06-06/#> .
@prefix iana: <http://www.iana.org/assignments/relation/> .
@prefix : <http://bblfish.net/people/henry/card#> .

<http://bblfish.net/people/henry/card>   a foaf:PersonalProfileDocument;
     foaf:maker :me;
     foaf:title "Henry Story's FOAF file";
     foaf:primaryTopic :me ;
     iana:alternate <http://bblfish.net/people/henry/card.rdf>,
                    <http://bblfish.net/people/henry/card.n3> .

<http://bblfish.net/people/henry/card.rdf>
       wot:assurance <http://bblfish.net/people/henry/asc.card.rdf.asc> ;
       awol:type "application/rdf+xml" .
<http://bblfish.net/people/henry/card.n3>
       wot:assurance <http://bblfish.net/people/henry/asc.card.n3.asc> ;
       awol:type "text/rdf+n3" .

:me    a foaf:Person;
       foaf:title "Mr";
       foaf:family_name "Story";
       foaf:givenname "Henry";
       foaf:openid <http://openid.sun.com/bblfish> ;
       foaf:openid <http://bblfish.videntity.org/> ;
       is wot:identity of [ a wot:PubKey;
                            wot:pubkeyAddress <ttp://bblfish.net/people/henry/henry.pubkey.asc>;
                            wot:fingerprint "0DF560B5DADF6D348CC99EA0FD76F60D4CAE10D7";
                            wot:hex_id "4CAE10D7";
                            wot:length 1024 ];

Which can be graphical represented as follows:

Building a web of Trust

From here is is easy to see how I can use the wot ontology to sign other files, link to my friends public signatures, sign their public signatures, how they could sign mine, etc, etc. and thereby create a cryptographically enhanced Web of Trust built on decentralised identity. Of course this is still a little complicated to put together by hand, but it should be really easy to automate, by incorporating it into the Beatnik Address Book for example. To illustrate this I will show I linked up to Dan Brickley and Tim Berners Lee's signature.

Dan Brickley has a foaf file which links to his public key and to the signature of his file. The relevant part of the graph is:

   <http://danbri.org/foaf.rdf>     a foaf:PersonalProfileDocument;
               wot:assurance <http://danbri.org/foaf.rdf/foaf.rdf.asc> .

   <http://danbri.org/foaf.rdf#danbri> 
               foaf:pubkeyAddress <http://danbri.org/danbri-pubkey.txt>

(foaf:pubkeyAddress is a relation that is not defined in foaf yet. Dan, one of the co-creators of foaf, is clearly experimenting here) Now with this information I can download Danbri's rdf, the public key and the signature and test that the document has not been tampered with. On the command line I do it like this:

bblfish$ gpg --import danbri.pubkey.asc
bblfish$ curl http://danbri.org/foaf.rdf > danbri.foaf.rdf
bblfish$ curl http://danbri.org/foaf.rdf.asc > danbri.foaf.rdf.asc
bblfish$ gpg --verify danbri.foaf.rdf.asc danbri.foaf.rdf

Of course this assumes that someone has not broken onto his server and changed all those files. As it happens I chatted with Dan over Skype (which I got from his foaf file!) and he sent me his public key that way. It certainly felt like I had Dan on the other side of the line, so I trust this public key enough. But why not publish what public key I am relying on? Then Dan and others can know what signature I am using and correct me if I am wrong. So I added the following to my foaf file

:me foaf:knows    [ = <http://danbri.org/foaf.rdf#danbri>;
                    a foaf:Person;
                    foaf:name  "Dan Brickley";
                    is wot:identity of
                          [ a wot:PubKey;
                            wot:pubkeyAddress <http://danbri.org/danbri-pubkey.txt>;
                            wot:hex_id "B573B63A" ]
                  ] .

In the above I am linking to Dan's public key on his server, the one that might have been compromised. Note that I specify the wot:hex_id of the public key though. Finding another public key with the same hex would be tremendously difficult I am told. But who knows. I have not done the maths on this. But I can make it even more difficult by signing his public key with my key, and placing that signature on my server.

bblfish$ gpg -a --detach-sign danbri.pubkey.asc

Then I can make that signature public by linking to it from my foaf file

<http://danbri.org/danbri-pubkey.txt> wot:assurance <danbri.pubkey.asc.asc> .

Now people who read my foaf file will know how I verify Dan's information, and detect if something does not fit between what I say and what Dan's says. If they then do the same when linking to my foaf file - that is mention in it where one is to find my public key, it's hash and sign my public key with their signature, placing that on their server - then anyone who would want to compromise any of us consistently to the outside world, would have to compromise all of the information on each of our servers consistently. As more people are added to the network, and link up to each other, the complexity of doing this grows exponentially. As there are more ways to tie information together there are more ways people can find the information, so the information becomes more valuable (as described in RDF and Metcalf's Law), and there are more ways to find inconsistencies, thereby making the information more reliable, thereby making it more valuable, and so on, and so on...

One can even make things more complicated for a wannabe hacker by placing other people's public keys one one's server, thereby duplicating information. Tim Berners Lee does not link to his public key, but I got it over irc, and published it on my web server. Then I can add that information to my foaf file:

:me foaf:knows [ = <http://www.w3.org/People/Berners-Lee/card#i>;
                    a foaf:Person;
                    foaf:name "Tim Berners Lee";
                    is wot:identity of
                           [ a wot:Pubkey;
                             wot:pubkeyAddress <timbl.pubkey.asc> ;
                             wot:hex_id "9FC3D57E" ];
                  ] .

By making public what public keys I use, I get the following benefits:

  • people can contact me and let me know if I am being taken for a ride,
  • it helps them locate public keys
  • the metadata associated with a public key grows
  • more people link into the cryptography network, making it more likely that more tools will build into this system

Encrypting parts of one's foaf file

Now that you have my public signature you can send me encrypted mail or other files using my public signature, verify signatures I place around the web and read files I encrypt with my private key. Of the files that I can encrypt one interesting one is my own foaf file. Of course encrypting the whole foaf file is not so helpful, as it breaks down the web of trust piece. But there may well be pieces of a foaf file that I would like to encrypt for only some people to see. This can be done quite easily as described in "Encrypting Foaf Files" [3]. Note that the encrypted file can be encrypted for a number of different users simultaneously. This would be simple to do, and would be of great help in making current practice available in an intelligent and coherent way to the semantic web. One thing I could do is encrypt it for all of my friend whose public key I know. I am not sure what kind of information I would want to do this with, but well, it's good to know its possible.

Notes

  1. This is how "PGP Signing FOAF Files" describes the procedure.
  2. The iana alternate (http://www.iana.org/assignments/relation/) relation is not dereferenceable. It would be very helpful if iana made ontologies for each of their relations available.
  3. Thanks to Stephen Livingstone for pointing that way in a reply to this blog post on the openid mailing list.
  4. A huge numbers of resources on the subject can be found on the Semantic Web Trust and Security Resource Guide, which I found via the Wikipedia Web of Trust page.
  5. This can be used to build a very simple but powerful web authentication protocol as described in my March 2008 article RDFAuth: sketch of a buzzword compliant authentication protocol

Search

Flickr Diary

www.flickr.com
This is a Flickr badge showing public photos from bblfish. Make your own badge here.

Recent Entries

Navigation

Referers