free web site hit counter

The Sun BabelFish Blog

Don't panic !

Wednesday May 07, 2008

BOF-5911: Building a Web 3.0 Address Book

To give everyone a chance to try out the So(m)mer Address Book, I have made it available via Java Web Start: just click on the picture to the right, and try it out.

The Address Book is currently demoware: it shows how one can build virally an open distributed social network client that solves the social network data silo problem (video). No need to have an account on every social networking site on which you have friends, and so maintain your data on each one. You can simply belong to one network and link to all your friends wherever they are. With one click of a button you can publish your social network to your own web server, using ftp, scp, WebDAV, or even Atom. You can then link to other people who have (or not in fact), a foaf file. By pressing the space bar when selecting a friend, the Address Book with then GET their file. So you can browse your social network.

To get going you can explore my social network by dragging my foaf file icon onto the first pane of the application.

In BOF-5911 which I will be presenting on Thursday at 7:30pm I will be presenting the social networking problem, demonstrating how the So(m)mer Address Book solves it, and showing in detail how it is build, what the problems are, and what work remains. I will also discuss how this can be used to create global single sign on based on a network of trust.

Friday Mar 28, 2008

RDFAuth: sketch of a buzzword compliant authentication protocol

Here is a proposal for an authentication scheme that is even simpler than OpenId ( see sequence diagram ), more secure, more RESTful, with fewer points of failure and fewer points of control, that is needed in order to make Open Distributed Social Networks with privacy controls possible.

Update

The following sketch led to the even simpler protocol described in Foaf and SSL creating a global decentralized authentication protocol. It is very close to what is proposed here but builds very closely on SSL, so as to reduce what is new down to nearly nothing.

Background

Ok, so now I have your attention, I would like to first mention that I am a great fan of OpenId. I have blogged about it numerous times and enthusiastically in this space. I came across the idea I will develop below, not because I thought OpenId needed improving, but because I have chosen to follow some very strict architectural guidelines: it had to satisfy RESTful, Resource oriented hyperdata constraints. With the Beatnik Address Book I have proven - to myself at least - that the creation of an Open Distributed Social Network (a hot topic at the moment, see the Economist's recent article on Online social network) is feasible and easy to do. What was missing is a way for people to keep some privacy, clearly a big selling point for the large Social Network Providers such as Facebook. So I went on the search of a solution to create a Open Distributed Social Network with privacy controls. And initially I had thought of using OpenId.

OpenId Limitations

But OpenId has a few problems:

  • First it is really designed to work with the limitations of current web browsers. It is partly because of this that there is a lot of hopping around from the service to the Identity Provider with HTTP redirects. As the Tabulator, Knowee or Beatnik.
  • Parts of OpenId 2, and especially the Attribute Exchange spec really don't feel very RESTful. There is a method for PUTing new property values in a database and a way to remove them that does not use either the HTTP PUT method or the DELETE method.
  • The OpenId Attribute Exchange is nice but not very flexible. It can keep some basic information about a person, but it does not make use of hyperdata. And the way it is set up, it would only be able to do so with great difficulty. A RESTfully published foaf file can give the same information, is a lot more flexible and extensible, whilst also making use of Linked Data, and as it happens also solves the Social Network Data Silo problems. Just that!
  • OpenId requires an Identity Server. There are a couple of problems with this:
    • This server provides a Dynamic service but not a RESTful one. Ie. the representations sent back and forth to it, cannot be cached.
    • The service is a control point. Anyone owning such a service will know which sites you authenticate onto. True, you can set up your own service, but that is clearly not what is happening. The big players are offering their customers OpenIds tied to particular authentication servers, and that is what most people will accept.
As I found out by developing what I am here calling RDFAuth, for want of a better name, none of these restrictions are necessary.

RDFAuth, a sketch

So following my strict architectural guidelines, I came across what I am just calling RDFAuth, but like everything else here this is a sketch and open to change. I am not a security specialist nor an HTTP specialist. I am like someone who comes to an architect in order to build a house on some land he has, with some sketch of what he would like the house to look like, some ideas of what functionality he needs and what the price he is willing to pay is. What I want here is something very simple, that can be made to work with a few perl scripts.

Let me first present the actors and the resources they wish to act upon.

  • Romeo has a Semantic Web Address Book, his User Agent (UA). He is looking for the whereabouts of Juliette.
  • Juliette has a URL identifier ( as I do ) which returns a public foaf representation and links to a protected resource.
  • The protected resource contains information she only wants some people to know, in this instance Romeo. It contains information as to her current whereabouts.
  • Romeo also has a public foaf file. He may have a protected one too, but it does not make an entrance in this scene of the play. His public foaf file links to a public PGP key. I described how that is done in Cryptographic Web of Trust.
  • Romeo's Public key is RESTfully stored on a server somewhere, accessible by URL.

So Romeo wants to find out where Juliette is, but Juliette only wants to reveal this to Romeo. Juliette has told her server to only allow Romeo, identified by his URL, to view the site. She could have also have had a more open policy, allowing any of her or Romeo's friends to have access to this site, as specified by their foaf file. The server could then crawl their respective foaf files at regular intervals to see if it needed to add anyone to the list of people having access to the site. This is what the DIG group did in conjunction with OpenId. Juliette could also have a policy that decides Just In Time, as the person presents herself, whether or not to grant them access. She could use the information in that person's foaf file and relating it to some trust metric to make her decision. How Juliette specifies who gets access to the protected resource here is not part of this protocol. This is completely up to Juliette and the policies she chooses her agent to follow.

So here is the sketch of the sequence of requests and responses.

  1. First Romeo's user Agent knows that Juliette's foaf name is http://juliette.org/#juliette so it sends an HTTP GET request to Juliette's foaf file located of course at http://juliette.org/
    The server responds with a public foaf file containing a link to the protected resource perhaps with the N3
      <> rdfs:seeAlso <protected/juliette> .
    
    Perhaps this could also contain some relations describing that resource as protected, which groups may access it, etc... but that is not necessary.
  2. Romeo's User Agent then decides it wants to check out protected/juliette. It sends a GET request to that resource but this time receives a variation of the Basic Authentication Scheme, perhaps something like:
    HTTP/1.0 401 UNAUTHORIZED
    Server: Knowee/0.4
    Date: Sat, 1 Apr 2008 10:18:15 GMT
    WWW-Authenticate: RdfAuth realm="http://juliette.org/protected/*" nonce="ILoveYouToo"
    
    The idea is that Juliette's server returns a nonce (in order to avoid replay attacks), and a realm over which this protection will be valid. But I am really making this up here. Better ideas are welcome.
  3. Romeo's web agent then encrypts some string (the realm?) and the nonce with Romeo's private key. Only an agent trusted by Romeo can do this.
  4. The User Agent then sends a new GET request with the encrypted string, and his identifier, perhaps something like this
    GET /protected/juliette HTTP/1.0
    Host: juliette.org
    Authorization: RdfAuth id="http://romeo.name/#romeo" key="THE_REALM_AND_NONCE_ENCRYPTED"
    Content-Type: application/rdf+xml, text/rdf+n3
    
    Since we need an identifier, why not just use Romeos' foaf name? It happens to also point to his foaf file. All the better.
  5. Because Juliette's web server can then use Romeo's foaf name to GET his public foaf file, which contains a link to his public key, as explained in "Cryptographic Web of Trust".
  6. Juliette's web server can then query the returned representation, perhaps meshed with some other information in its database, with something equivalent to the following SPARQL query
    PREFIX wot: <http://xmlns.com/wot/0.1/>
    SELECT ?pgp
    WHERE {
         [] wot:identity <http://romeo.name/#romeo>;
            wot:pubkeyAddress ?pgp .
    } 
    
    The nice thing about working at the semantic layer, is that it decouples the spec a lot from the representation returned. Of course as usage grows those representations that are understood by the most servers will create a de facto convention. Intially I suggest using RDF/XML of course. But it could just as well be N3, RDFa, perhaps even some microformat dialect, or even some GRDDLable XML, as the POWDER working group is proposing to do.
  7. Having found the URL of the PGP key, Juliette's server, can GET it - and as with much else in this protocol cache it for future use.
  8. Having the PGP key, Juliette's server can now decrypt the encrypted string sent to her by Romeo's User Agent. If the decrypted string matches the expected string, Juliette will know that the User Agent has access to Romeo's private key. So she decides this is enough to trust it.
  9. As a result Juliette's server returns the protected representation.
Now Romeo's User Agent knows where Juliette is, displays it, and Romeo rushes off to see her.

Advantages

It should be clear from the sketch what the numerous advantages of this system are over OpenId. (I can't speak of other authentication services as I am not a security expert).

  • The User Agent has no redirects to follow. In the above example it needs to request one resource http://juliette.org/ twice (2 and 4) but that may only be necessary the first time it accesses this resource. The second time the UA can immediately jump to step 3. [but see problem with replay attacks raised in the comments by Ed Davies, and my reply] Furthermore it may be possible - this is a question to HTTP specialists - to merge step 1 and 2. Would it be possible for a request 1. to return a 20x code with the public representation, plus a WWWAuthenticate header, suggesting that the UA can get a more detailed representation of the same resource if authenticated? In any case the redirect rigmarole of OpenId, which is really there to overcome the limitations of current web browsers, in not needed.
  • There is no need for an Attribute Exchange type service. Foaf deals with that in a clear and extensible RESTful manner. This simplifies the spec dramatically.
  • There is no need for an identity server, so one less point of failure, and one less point of control in the system. The public key plays that role in a clean and simple manner
  • The whole protocol is RESTful. This means that all representations can be cached, meaning that steps 5 and 7 need only occur once per individual.
  • As RDF is built for extensibility, and we are being architecturally very clean, the system should be able to grow cleanly.

Contributions

I have been quietly exploring these ideas on the foaf and semantic web mailing lists, where I received a lot of excellent suggestions and feedback.

Finally

So I suppose I am now looking for feedback from a wider community. PGP experts, security experts, REST and HTTP experts, semantic web and linked data experts, only you can help this get somewhere. I will never have the time to learn these fields in enough detail by myself. In any case all this is absolutely obviously simple, and so completely unpatentable :-)

Thanks for taking the time to read this

Wednesday Mar 19, 2008

Semantic Web for the Working Ontologist

I am really excited to see that Dean Allemang and Jim Hendler's book "Semantic Web for the Working Ontologist" is now available for pre-order on Amazon's web site. When I met Dean at Jazoon 2007 he let me have a peek at an early copy of this book[1]: it was exactly what I had been waiting a long time for. A very easy introduction to the Semantic Web and reasoning that does not start with the unnecessarily complex RDF/XML [2] but with the one-cannot-be-simpler triple structure of RDF, and through a series of practical examples brings the reader step by step to a full view of all of the tools in the Semantic Web stack, without a hitch, without a problem, fluidly. I was really impressed. Getting going in the Semantic Web is going to be a lot easier when this book is out. It should remove the serious problem current students are facing of having to find a way through a huge number of excellent but detailed specs, some of which are no longer relevant. One does not learn Java by reading the Java Virtual Machine specification or even the Java Language Specification. Those are excellent tools to use once one has read many of the excellent introductory books such as the unavoidable Java Tutorial or Bruce Eckel's Thinking in Java. Dean Allemang and Jim Hendler's books are going to play the same role for the Semantic Web. Help get millions of people introduced to what has to be the most revolutionary development in computer science since the development of the web itself. Go and pre-order it. I am going to do this right now.

Notes

  1. the draft I looked at 9 months ago had introductions to ntriples, turtle, OWL explained via rules, SPARQL, some simple well known ontologies such as skos and foaf, and a lot more.
  2. The W3C has recently published a new RDF Primer in Turtle in recognition of the difficulty of getting going when the first step requires understanding RDF/XML.

Friday Feb 15, 2008

Proof: Data Portability requires Linked Data

Data Portability requires Linked Data. To show this let me take a concrete and topical example that is the core use case of the Data Portability movement: Jane wants to move her account from social network A to social network B. And she wants to do this in a way that entails the minimal loss of information.

Let us suppose Jane wants to make a rich copy, and that she wants to do this without hyperdata. Ideally she would like to have exactly the same information in the new space as she had in the old space. So if Jane had a network of friends in social network A she would like to have the same network of friends in B. But this implies moving all the information about all her friends from A to B, including their social network too. For after all the great thing about one's friends is how they can help us make new friends. But then would one not want to move all the social network of one's friends too? Where does it stop? As William Blake said so well in Auguries of Innocence

        To see a world in a grain of sand,
	And a heaven in a wild flower,
	Hold infinity in the palm of your hand,
	And eternity in an hour.
the problem is that everything is linked in some way, and so it is impossible to move one thing and all its relations from one place to another using just copy by value, without moving everything. A full and rich copy is therefore impossible.

So what about pragmatically limiting ourselves to some subset of the information? We have to reduce our ambitions. So let us limit the data Jane can move to just her personal data and closest social network. So she copies some subset of the information about her friends over to network B. Nice, but who is going to keep that information up to date? When Jane's friend Jack moves house, how is Jane going to know about this in her new social network? Would Jack not have to keep his information on social Network B up to date too? And now if every one of Jack's 1000 friends moves to a different social network, won't he have to now keep 1000 identities up to date on each of those networks? Making it easy for Jane to move social network is going to make life hell for Jack it seems. Well of course not: Jack is never going to keep the information about himself up to date on these other social networks, however limited it is going to be. And so if Jane moves social network she is going to have to leave her friends behind.

The solution of course is not to try to copy the information about one's friends from one social network to another, but rather to move one's own information over and then link back to one's friends in their preferred social network. By linking by reference to one's friends identity one reduces to a minimum the information that needs to be ported whilst maintaining all the relationships that existed previously. Thus one can move one's identity without loss.

The rest follows nearly immediately from these observations. Since the only way to refer to resources in a global namespace is via URIs ( and the most practical way currently is to do this with URLs ), URI's will play the role of pointers in our space. This is the key architectural decision of the semantic web. So by giving people URLs as names we can point to our friends wherever they are, and even move our data without loss. All we need to do when we move our foaf file is to have the web server serve up a HTTP redirect message at the old URL, and all links to our old file will be redirected to our new home.

Notes

Tuesday Jan 15, 2008

Data Portability: The Video

Here is an excellent video to explain the problem faced by Web 2.0 companies and what Data Portability means. It is amazing how a good video can express something so much more powerfully, so much more directly than words can. Sit back and watch.


DataPortability - Connect, Control, Share, Remix from Smashcut Media on Vimeo.

Feeling better? You are gripped by the problem? Good. You should now find that my previous years posts start making a lot more sense :-)

Will the Data Portability group get the best solution together? I don't know. The problem with the name they have chosen is that it is so general, one wonders whether XML is not the solution to their problem. Won't XML make data portability possible, if everyone agrees on what they want to port? Of course getting that agreement on all the topics in the world is a never ending process.... Had they retained the name of the original group this stemmed from, Social Network Portability then one could see how to tackle this particular issue. And this particular issue seems to be the one this video is looking at.

But the question is also whether portability is the right issue. Well in some ways it is. Currently each web site has information locked up in html formats, in natural language (or even sometimes in jpegs (see the previous story of Scoble and Facebook), in order to make it difficult to export the data, which each service wants to hold onto as if it was theirs to own.

Another way of looking at this is that the Data Portability group cannot so much be about technology as policy. The general questions it has to address are question of who should see what data, who should be able to copy that data, and what they should be able to do with it. This does indeed involve identity technology insofar as all of the above questions turn around questions of identity ("who?"). Now if every site requires one to create a new identity in order to access one's data one has the nightmare scenario depicted in the video, where one has to maintain one's identity across innumerable sites. As a result the policy issue of Data Portability does require one to solve the technical problem of distributed identity: how can people maintain the minimum number of identities on the web? (ie not one per site) Another issue that follows right upon the first is that if one wants information to only be visible to a select group of people - the "who sees what" part of the question - then one also needs a distributed way to be able to specify group membership, be it friendship based or other. The video again makes that point very clearly why having to recreate one's social network on every site is impractical.

What may be misleading about the term Data Portability is that it may lead one to think that what one wants is to copy one's social information from one social service to another. That would just automate the job of what the video illustrates people having to do by hand currently. But that is not a satisfactory solution. Because one cannot extract a graph of information from one space to another without loss. If I extract my friends from LinkedIn into FaceBook, it is quite certain that Facebook will not recognise a large number of the people I know on LinkedIn. Furthermore the ported information on FaceBook would soon be out of date, as people updated their network and profiles on LinkedIn. Unless of course Facebook were able to make a constant copy of the information on LinkedIn. But that's impossible right? Wrong! That is the difference between copy by value and copy by reference. If FaceBook can refer to people on LinkedIn, then the data will always be as up to date as it can be. So this is how one moves from DataPortability to Linked Data, also known as hyper data.

Sunday Jan 06, 2008

2008: The Rise of Linked Data

Here is my one prediction for 2008. Social Networking's breakdown will lead to the rise of Linked Data. Here is the logic:

  1. Social Networking sites have grown tremendously over the last few years fuelled by huge profits from advertising dollars. When I worked at AltaVista it was well known that the more you knew about your users the more valuable an ad became. If you know all the friends, interests, habits of someone, and you know what they are doing right now, you can suggest exactly the right product at the right time to them. The cost of a simple add on AltaVista was $5 per thousand page views. If you knew a lot about what someone was looking for the value could go up to $50.
  2. The allure of profit is leading to an ever increasing number of players in this space. See the Social Networking 3.0 talk at Stanford earlier in 2007.
  3. This in turn leads to a fracturing of the Social Networking space. As more players enter the space, each ends up with a smaller and partial view of the whole graph or social relations.
  4. Which is leading to the need for Social Network Portability, and more generally Data Portability. Users such as Scoble want to use their data on their own computer and link it together. Social Network Providers such as Plaxo or Facebook have a financial interest in helping their users move with their social network to their service. Facebook helps users extract all the information from GMail. Plaxo wants to help users extract all the information from every other social network.
  5. Privacy concerns will mount tremendously as a result. Each social network will increase in their users the fear of giving their data over to other "spamming" services, to defend their position. But to do this they will make it more and more difficult to extract the data from their service, annoying and so going against their users desires for linking their information. This will seem more and more like an issue for anti trust involvement as the ire of more and more people mount.

The force of the above logic will release the energy needed for an investment in Linked Data tools such as Beatnik, since it solves all the problems mentioned above - at the expense of killing the dream some investors may have had of a world where they own Nineteen Eighty Four like, the world.

Data Portability: Scoble Right or Wrong and beyond

Scoble explains Video

In this video Scoble explains how he got thrown off Facebook.

Here is a short summary, but the video is well worth watching as the emotions come through much better...
Facebook, which asks its users for their Gmail password in order to extract all the contacts someone has from their mail history and build up a possible list of friends, Facebook which scans the web for information to suggests friendships you may have, that same Facebook does not want anyone, including YOU, to be able to extract the data in your account on their web site even were it only into your own electronic address book. To do this they encode all email addresses as images which make it very difficult for a computer to decode, and so makes it tedious to move and use that information. So when Scoble tried to extract his 5000 friends using Optical Character Recognition - an idea suggested by Plaxo which wants to be a hub of people information - , Facebook noticed this and cut off his account. (I think he may have been reinstated now - but whether there is a point in belonging to such a service is a serious question now).As a result Scoble and other have asked people to join the conversation on the Data Portability group.

This clearly is a very important issue. But his solution to the issue was not the best one. By using Plaxo - which wants to be the social graph hub of the web - to extract his data, he would have been able to do what clearly he should be able to do, namely add his contact information easily to Outlook. But he did this at the cost of allowing a third entity to gather a lot of information about him and his contacts. CNET's The Scoble scuffle: Facebook, Plaxo at odds over data portability, touches on the issue. Allowing a third service provider to extract all your data in order to give you access to it, is not improving your freedom. It is just giving another commercial entity access to a huge network of information about you. And the more a company knows about its users the more valuable the advertising its sells becomes. There is no mystery here as to why Social Networking sites have had so much money pumped into them over the last few years. So you have jumped out of the frying pan right into the fire here. Clearly if you are concerned with security of your information - with Facebook you had one commercial entity that had a lot more information about you than it should - now you have two.

Really what you want is the following:

  1. Selectivity in who gets what information about you:
    • Strangers should be able to see the minimum information I want to make public.
    • acquaintances should see more
    • family should see other information
    • ... these policies should be flexible and determinable by the owner of the information, by the person making the speech act of affirming it.
    And even though I may be happy for a service provider to maintain this data, you may not even wish to allow them access to it. It should be possible to have this information on your server at home controlled only by you.
  2. Link to friends wherever they are. After all if you have to go through one central aggregator of relationship information, then that aggregator will have a view of all the relationship information available, giving one actor complete and overwhelming advantage as opposed to everyone else. You need distributed data, also known as linked data or hyperdata.
  3. An Open Data structure so as to allow ecosystems to grow and use that information. I want the tools on my computer to all be able to work with my social network information.
  4. A way to determine trust

Allowing different people to see more or less information (point 1 above) should be quite easy to set up by having the server return different representation depending on who is viewing the information, determined by their having logged in to your site with something like OpenId. Linking information in a distributed way is easy using Semantic Web technologies, and is demonstrated by tools such as Beatnik. Beatnik is just one of the tools that could use such information on my desktop (thereby fulfilling point 3 above).

What you say, out loudly or on your web site is a speech act. All information is the speech act of some one, and it is this that allows us to determine our level of trust it in. This is also why one should try to say less rather than more, since every piece of information one publishes is information one may have to defend. It is therefore much better if we have a system where everyone can look after a small part of the graph of information they have a responsibility for and defend it. They can then point to information maintained by other people, who will have to defend their piece. But since pointing to information maintained by others is a vote of confidence in them, an economy of links will emerge whereby people want to increase the number of quality links to them, which will only happen if they are deemed trustworthy. So the system allows for distributed trust. For a simple but excellent example see the Distributed Information Group wiki's policy for allowing people to post.

Wednesday Dec 19, 2007

Hyperdata in Sao Paulo

In the past week I gave a couple of presentations of Hyperdata illustrating the concept with demos of the Tabulator and Beatnik, the Hyper Address Book I am just working on.

The first talk I gave at the University of Sao Paulo, which was called at the last minute by Professor Imre Simon, who had led the Yochai Benkler talk the week before. It was a nice turnout of over 20 people, and I spoke at a more theoretical level of the semantic web, how it related to Metcalf's law, as explained in more detailed in a recently published paper by Prof. James Hendler, and how an application like Beatnik could give a powerful social meaning to all of this. I also looked at some of the interesting problems related to trust and belief revision that come up in a simple application like Beatnik, which touched a chord with Renata Wassermann who has written extensively on that field of the Semantic Web.
Many thanks to Prof Simon, for allowing me to speak. For a view from the audience see Rafael Ferreira's blog (in English) and Professor Ewout's blog (in Portuguese).

Yesterday I gave a more Java oriented technical talk at GlobalCode, an evening learning center in Sao Paulo, with a J2EE project on dev.java.net. I touched on how one may be able to use OpenId and foaf to create a secure yet open social network.
About 25 people attended which must be a really good turnout for a period so close to Christmas, when everyone is looking forward to the surf board present from Santa Claus, getting into their swimming trunks and paddling off to catch the next big wave. Well the really big wave that everyone in the know will be preparing for is the hyperdata wave. And to catch it one needs to practice one's skills. And a good way to do this is to help out with a simple application like Beatnik.
Thanks to Vinicius and Yara Senger for organising this.

Saturday Dec 15, 2007

James Gosling has a foaf name

And so does Tim Bray, Greg Papadopoulos, Jonathan Schwartz, Sun Microsystems, and Java. All thanks to the great work of the DBPedia people, a loose network of highly skilled distributed self selected avant garde force de frappe, who are extracting all the metadata possible from Wikipedia and making it available as hyperdata, ready to be linked to. :-)

You can browse their information on the web, or with the Tabulator generic data browser which will merge information it finds into one large graph as you explore it. As a result of this I can now add Tim Bray and James Gosling to my foaf file (foaf icon), by adding the following N3 statements:

:me foaf:knows [ = <http://dbpedia.org/resource/James_Gosling>;
                    a foaf:Person;
                    foaf:name "James Gosling" ],
               [ = <http://dbpedia.org/resource/Tim_Bray>;
                    a foaf:Person;
                    foaf:name "Tim Bray" ] .

It is worth looking at how DBPedia works. http://dbpedia.org/resource/James_Gosling is now a Universal Resource Identifier for James Gosling. You cannot fetch James because he is not an information resource, ie, he is not a document, though he is very resourceful, and full of interesting information. You can tell that James is not an information resource because you can't copy him easily. So when you do an HTTP GET on that URI you get the following:

hjs@bblfish:0$ curl -I http://dbpedia.org/resource/James_Gosling
HTTP/1.1 303 See Other
Date: Sat, 15 Dec 2007 17:57:54 GMT
Server: Apache-Coyote/1.1
Vary: Accept,User-Agent
Location: http://dbpedia.org/page/James_Gosling
Content-Type: text/plain
Content-Length: 90

ie you get a redirect to the page about James Gosling. This is because curl by default asks for the html representation of resources. Had you sepecified that you wanted the machine readable rdf/xml representation you would get a redirect to another resource:

hjs@bblfish:0$ curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/James_Gosling
HTTP/1.1 303 See Other
Date: Sat, 15 Dec 2007 18:01:10 GMT
Server: Apache-Coyote/1.1
Vary: Accept,User-Agent
Location: http://dbpedia.openlinksw.com:8890/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FJames_Gosling%3E
Content-Type: text/plain
Content-Length: 210

Here you get a redirect to a SPARQL query to DESCRIBE James Gosling. To get the full content, in N3 try:

hjs@bblfish:0$ curl -L -H "Accept: text/rdf+n3" http://dbpedia.org/resource/James_Gosling 

the -L flag follows all the redirects...

Friday Dec 07, 2007

Yochai Benkler: The Wealth of Networks

This afternoon I attended a teleconference at the University of Sao Paulo where Yochai Benkler talked from the Berkman Center for Internet and Society at Harvard, about his now famous book "The Wealth of Networks" (available online) and answered questions from the audience. Yochai talked about the impact of open source and peer to peer modes of co-operative production on economics, politics, arts and education. The book has many excellent and illuminating examples on how massively parallel and distributed use of human resources can outperform large centrally organised tayloristics production methods. He does point out that this won't work in every field of endeavour, but more naturally in knowledge based ones, where the cost of reproduction is close to zero. More details in the freely available book.

The conference was organised by Imre Simon from the Institute of Advanced Studies of the University of Sao Paulo. A web site in portuguese is dedicated to this talk, and it was broadcast live on the web.

At the end of the talk, as the last question from the floor, I asked about what research had been done into applying Metcalf's law to networks as powerful as the Semantic Web, and so how this would affect questions on the wealth of networks. Yochai seemed to think that the Semantic Web was too much about data, and not about people. Of course Beatnik, the semantic address book I am working on right now, is going to show how this dichotomy is completely illusory, and how the distributed, decentralised world of hyperdata should fit perfectly into the central thesis of the book. :-)

Wednesday Dec 05, 2007

Life is Champagne

It's full of bubbles

VIDEO CLIP: Here comes another bubble, performed by the Richter Scales -- you may be using a browser that does not display the flash shockwave content. You can test this by trying to view it directly on youtube. If you can watch other videos there with this browser, then let me know, it may be that the link is dead.

Thanks to Matt Hempey and The Richter Scales

Friday Oct 19, 2007

Twine: Organising *your* information

Nova Spivack's company Radar Networks today unveiled at the Web2.0 summit in SF, the new service Twine. Nova Spivack has no trouble pronouncing the phrase "Semantic Web", and has built this whole service on those technologies. He describes this in a detailed podcast: Twine: A social network built on the semantic web. One quote I liked: "Whereas Google's mission is to organize the world's information, Twine's mission is to organize your information."

This looks very interesting. Nova Spivack and Lew Tucker presented the Semantic Web Birds Of a Feather at this years Java One. This is the service they were speaking about.

More on the web:

Monday Oct 08, 2007

Open Data Licences

The amount of Open Data is growing fast. The idea that data may need protection in an Open Society is bizarre enough, but in Europe at least a whole set of laws have been put in place for this purpose. For those who wish to add data to the Commons, so that it may better contribute to the value of the network as predicted by Metcalf's law, current Open licences will not do it seems. This is, as I understand, because copyright licenses do not cover data well, since a set of relations can be serialized in any number of ways: order does not matter, it is easy to refactor data, or combine it with other data. (I wonder then why this was not a problem for source code?)

To help resolve these issues, Talis, a Leading Semantic Web company, helped fund research into this area which resulted in the Open Data Licence project, which is now seeking feedback on their proposals. From my quick reading of it this license seems to have a gnu feel to it, but I may be wrong.

Sunday Oct 07, 2007

Why Web 3.0?

As Tim O'Reilly admitted recently, the Web 2.0 meme was created to help businesses get over the dotcom crash. There was no way of getting investors to put money in the web, so it was important to rebrand. Mike Bergman - and many others - may not like it, and quite reasonably so, but this was probably a business necessity. The web of course never died and clearly never will. But since "Web" got associated with bust, Crash, 911 and what not, it was important to emphasize that everything did not end, there was a new beginning. There was a life after Pet Shop stores. This is the point of Web 2.0, and Tim O'Reilly did a great job with his article "What is Web 2.0?" in emphasizing this evolution.

The rebranding was extremely successful. But with success often comes conceit, and it became obvious that major evolutions were being left out of the Web 2.0 sphere. And, as this recent article by Tim indicates, "Today's Web 3.0 Nonsense Blogstorm", the key proponents of 2.0 do not feel like allowing those technologies in, either because they don't understand them, or because they have enough on their plate, or because they find it difficult to speak about it to their investors, or a combination of all of those. It is difficult to explain since as I showed in a recent article that Semantic Web technologies very nicely complement O'Reilly's Web 2.0 patterns. Whatever the reasons for this rejection, it is clear that there is an after Web 2.0 building up, and so the best way to name it is Web 3.0. For some reason this after seems to be unpleasant to the 2.0 folks. Of course since it probably limits the capital they have access too. Competition does that. But it is a limit that they are imposing on themselves. Was it because it was easier for them to build momentum for their ideas? Starting small is a good strategy. But no one can own the whole future. It evolves, and idea that Nova Spivack defends very clearly, and for which he is rewarded by having some clever investors.

In fact we should be glad there is the Web 2.0 crowd and that Tim manages to argue so well at keeping them there, and frightening them from coming over here. Without this boundary it would be much more difficult to explain what is new, and we would end up being overwhelmed by a me too crowd intent on latching onto the latest. (There are a few of those already here, btw.) So yes. Web 3.0 is the future, but it is a risky one. On the other hand as the Web 2.0 space fills up, life will be getting more and more difficult in the red ocean of intense competition, witness the never ending new social networking startups. Inevitably the risks of going three are going to be outweighed by the difficulty of staying in me 2 land.

But if all of this still makes the hair rise up on your head, I suggest using the web n+1 shorthand. That puts you at the bleeding edge always, in a politically correct way. And for the whole thing explained with a lot more humour, see Web 3.0 I$ About Money.

Friday Oct 05, 2007

Doap Bean available

I have just made the NetBeans Doap Bean available on the plugin portal. Just download onto your desktop and install in a version of NetBeans 6 (check Tools < Plugins in the menu)

This is the module I demonstrated at James Gosling's 'fun things' presentation on NetBeans day in San Francisco. I have updated the code to make it easy to understand for people who would wish to emulate and enhance it. It is easy to do that. Install the plugin, and go to the https://sommer.dev.java.net/ project. Then drag the blue button next to the URL

from your browser (I have checked that it works with Safari and Firefox on OSX) onto the DOAP button on the toolbar. This will fetch the information from the web page and pop up a window with a human readable representation of the RDF. This window should look like this:

window describing the so(m)mer project

Clicking on the other tabs will show you the original RDF/XML or an easier to read Turtle representation of the data. It is really important to show these tabs so that you can distinguish good from bad doap. Of course one can also go to the W3C Validator for an independent opinion.
In any case if the source code is available via a CVS or Subversion repository, you should be able to download it with just the click on the "download" button. (Make sure that NetBeans knows where your svn command line tool is though, by going to the menu Versioning &gr; Subversion > Checkout... )

If you want to try dropping other projects onto the button go to DoapSpace, they have put together a large collection of doap files for all the projects on SourceForge, Freshmeat and PyPi.

As I mention this is really only version 0.1 of the doap integration of Netbeans. Clearly one could do a lot more, such as:

  • Having it produce Doap for a project automatically
  • Tying it into NetBeans's Project panel
  • describing the relationships a project and others it depends on
  • Linking bug reports to information gleaned from the doap:bugdatabase relation
  • Perhaps see if one can set things up so that one can immediately find the javadoc online for a doap project one has information of
  • find a way to view source on a jar, by relating jars to source code repositories... (more difficult this one)
  • and a lot more...

Now you may wonder: How is one going to know that there is a doap link on some project's source page? Searching for the doap link seems a lot of work, right? Well to get an idea of how things will integrate you can install the Firefox Semantic Radar plugin, and go to the So(m)mer project again. You will then see displayed at the bottom of your browser an icon of square smiley faces, as shown on the following screenshot

semantic radar icon in Firefox

I should probably add this icon to the Doap button come to think of it...
The Doap button is in the So(m)mer repository, which is all published under the very generous BSD licence, so you are welcome to help out and add your own features... I may be having to work on a few other things next, so I won't be getting in your way :-)

Thursday Sep 20, 2007

hyperdata

I just came across a recent post by Nova Spivack, "The Semantic Web, Collective Intelligence and Hyperdata", where he defines a couple of very useful words: hyperdata and folktologies. The one I'd like to look at here is the very important concept of hyperdata:

One might respond [...] by noting that there is already a lot of data on the Web, in XML and other formats -- how is the Semantic Web different from that? What is the difference between "Data on the Web" and the idea of "The Data Web?"

The best answer to this question that I have heard was something that Dean Allemang said at a recent Semantic Web SIG in Palo Alto. Dean said, "Sure there is data on the Web, but it's not actually a web of data." The difference is that in the Semantic Web paradigm, the data can be linked to other data in other places, it's a web of data, not just data on the Web.

I call this concept of interconnected data, "Hyperdata." It does for data what hypertext did for text. I'm probably not the originator of this term, but I think it is a very useful term and analogy for explaining the value of the Semantic Web.

Nova's Hyperdata article was written in response to Tim O'Reilly's recent post Economist Confused about the Semantic Web. Tim correctly points out that the word Semantic is often used to cover technologies that are closer to Web2.0, data silo technologies. But to get out of the data silos, we need hyperdata which is, like the web, and contra Tim, a social/folk/community enterprise.

The original Economist article was published on August 28 2007: The web: some antics

For a very good tutorial introduction to hyperdata see How to Publish Linked Data on the Web by Chris Bizer, Richard Cyganiak, and Tom Heath .

Tuesday Sep 18, 2007

M2N: building Swing apps with N3

At the Triple-I conference in Graz, I came across one very interesting demo by the M2N Intelligence Management company, where they showed a Development environment powered by N3, the powerful, easy to read notation for the Semantic Web. Using a Visual Editor that mapped UML diagrams to N3, instead of the usual limited and difficult to understand OMG MOF family of standards, we could see how one could build a complete User Interface application, including logic in a visual way. The same could be done manually in vi by editing the N3 directly, for those proficient enough. I think they describe parts of this very generally on their solutions page.

It is a pity that M2N does not open source this library, as that would allow one to get a better idea as to the advantages of doing things this way. Sebastian Schaffert - who works at a research company in Salzburg, and was looking at their demo with me - was quite enthusiastic about the idea. There was a lot one could do with such a tool, he thought, such as being able to SPARQL query one's user interface, test it for constraints, etc...

It would be nice to have some feedback from people who had used this on the pros and the cons of their implementation, or of the general idea.

Microsoft Media Manager

David Seth reports today on Micosoft's Interactive Media Manager, based on RDF and OWL, two key semantic web technologies.

This RDF model allows companies to add nuance and intelligence to media management beyond what is possible with traditional metadata.

While I am on media, I might as well mention, for those who don't allready know, that Joost, which I believe is somehow related to Skype, and that is working on Peer to Peer video, is also using RDF. Not sure how, but since Dan Brickley - one of the brains behind foaf - is working there, it is quite likely going to be very interesting.

Intel® Mash Maker: Mashups for the Masses

Intel® Mash Maker is a new service that make it easy to create screen scrapers to extract data from normal web pages and create mashups. The tag lines on it's front page are:

  • "Browse don't program": the mashmaker will suggest mashups as you browse the web
  • "View the internet not just a web page": it combines many web pages into one view
  • "Enter the semantic Web via the back door": draws on the wisdom of the community to understand the structure and semantics of information on the web

It comes with a Firefox plugin with which one can create screen scrapers using XPath Queries, to extract data which one can then save on their server - and which I think then belongs to Intel® . There are a number of video's that show how this works on their site.

This is clearly very similar to the Piggy Bank Firefox plugin from MIT. Is the novely here that Intel is hosting the mashups, and perhaps even cleaning them up? Or that everything then belongs to them? [ not so: the lisence on the contributions is very light, though without an obligation to share ]

Notes

Thanks to Josef Holy for pointing me to this, enthusiastic at the visible spread of the Semantic Web meme.

Danny Ayers, has looked a little further into this: There is a Front Door

Sunday Sep 09, 2007

Language is a Virus

That is key to understanding the development of the Semantic Web. Open the door and listen to this classic 1986 song by Laurie Anderson, "Language is a Virus" (lyrics):

VIDEO CLIP -- you may be using a browser that does not display the flash shockwave content. You can test this by trying to view it directly on youtube. If you can watch other videos there with this browser, then let me know, it may be that the link is dead. Thanks.

Then if you wish to explore this in more detail you can read the philosophical papers of Ruth G. Millikan.

Search

Flickr Diary

www.flickr.com
This is a Flickr badge showing public photos from bblfish. Make your own badge here.

Recent Entries

Navigation

Referers