The Sun BabelFish Blog
Don't panic !
Webifying Integrated Development Environments
IDEs should be browsers of code on a Read Write Web. A whole revolution in how to build code editors is I believe hidden in those words. So let's imagine it. Fiction anticipates reality.
Imagine your favorite IDE, a future version of NetBeans perhaps or IntelliJ, which would make downloading a new project as easy as dragging and dropping a project url onto your IDE. The project home page would point to a description of the location of the code, the dependencies of this project on other projects, described themselves via URL references, which themselves would be set up in a similar manner. Let's imagine further: instead of downloading all the code from CVS, think of every source code document as having a URL on the web. ( Subversion is in fact designed like this, so this is not so far fetched at all.) And let's imagine that NetBeans thinks about each software component primarily via this URL.
Since every piece of code and every library has a URL, the IDE would be able to use RESTful architectural principles of the web. A few key advantages of this are
- Caching: web architecture is the ability to cache information on the network or locally without ambiguity. This is how your web browser works ( though it could work better ). To illustrate: once a day Google changes its banner image. Your browser and every browser on earth only fetches that picture once a day, even if you do 100 searches. Does Google serve one image to each browser? No! numerous caches (company, country, or other) cache that picture and send it to the browser without sending the request all the way to the search engine, reducing the load on their servers very significantly.
- Universal names: since every resource has a URL, any resource can relate in one way or another to any other resource wherever it is located. This is what enables hypertext and what is enabling hyperdata.
- No need to download libraries twice: if you have been working on open source projects at all frequently you must have noticed how often the same libraries are found in each of the projects you have downloaded. Apache logging is a good example.
- No need to download source code: it's on the web! You don't therefore need a local cache of code you have never looked at. Download what you need when you need it (and then cache it!): the Just in Time principle.
- Describe things globally: Since you have universal identifiers you can now describe how source code relates to documentation, to people working on the code, or anything else in a global way, that will be valid for all. Just describe the resources. There's a framework around just for that, that is very easy to use with the right introduction.
The above advantages may seem rather insignificant. After all, real developers are tough. They use vi. (And I do). So why should they change? Well notice that they also use Adobe Air or Microsoft Silverlight. So productivity considerations do in fact play a very important factor in the software ecosystem.
Don't normal developers just work on a few pieces of code? Well speaking for myself here, I have 62 different projects in my /Users/hjs/Programming directory, and in each of these I often have a handful of project branches. As more and more code is open source, and owned and tested by different organizations, the number of projects available on the web will continue to explode, and due to the laziness principle the number of projects using code from other projects will grow further. Already whole operating systems consisting of many tens of thousands of different modules can be downloaded and compiled. The ones I have downloaded are just the ones I have had the patience to get. Usually this means jumping through a lot of hoops:
- I have to finding the web site of the code. And I may only have a jar name to go by. So Google helps. But that is a whole procedure in itself that should be unecessary. If you have an image in your browser you know where it is located by right-clicking over it and selecting the URL. Why not so with code?
- Then I have to browse a web page, which may not be written in my language, and find the repository of the source code
- Then I have to find the command line to download the source code, or the command in the IDE and also somehow guess which version number produced the jar I am using.
- Once downloaded, and this can take some time, I may have to find the build procedure. There are a few out there. Luckily ant and maven are catching on. But some of these files can be very complicated to understand.
- Then I have to link the source code on my local file system to the jar on my local file system my project is using. In NetBeans this is exceedingly tedious - sometimes I have found it to be close to impossible even. IntelliJ has a few little tricks to automate some of this, but it can be pretty nasty too, requiring jumping around different forms. Especially if a project has created a large number of little jar files.
- And then all that work is only valid for me. Because all references are to files on my local file system, they cannot be published. NetBeans is a huge pain here in that it often creates absolute file URLs in its properties files. By replacing them with relative urls one can get publish some of the results, but at the cost of copying every dependency into the local repository. And working out what is local and what is remote can take up a lot of time. It will work on my system, but not on someone else's.
- Once that project downloaded one may discover that it depends on yet another project, and so we have to go back to step 1.
So why do we have to tie together all the components on our local file system? This is because the IDE's are not referring to the resources with global identifiers. The owner of the junit project should say somewhere, in his doap file perhaps that:
@prefix java: <http://java.net/ont/java#> . #made this up
@prefix code: <http://todo.eg/#> .
<http://project.eg/svn/lib/junit-4.0.jar> a java:Jar;
code:builtFrom <http://junit.sourceforge.net/> .
#what would be needed here needs to be worked out more carefully. The point is that we don't
#at any point refer to any local file.
Because this future IDE we are imagining together will then know that it has stored a local copy of the jar somewhere on the local file system, and because it will know where it placed the local copy of the source code, it will know how the cached jar relates to the cached source code, as illustrated in the diagram above. So just as when you click on a link on your web browser you don't have to do any maintenance to find out where the images and html files are cached on your hard drive, and how one resource (you local copy of an image) relates to the web page, so we should not have to do any of this type of work in our Development Environment either.
From here many other things follow. A couple of years ago I showed how this could be used link source code to bugs, to create a distributed bug database. Recently I showed how one could use this to improve build scripts. Why even download a whole project if you are stepping through code? Why not just fetch the code that you need when you need it from the web? One HTTP GET at a time. The list of functional improvements is endless. I welcome you to list some that you come up with in the comments section below.
If you want to make a big impact in the IDE space, that will be the way to go.
Posted at 12:45PM Jun 24, 2008 [permalink/trackback] by Henry Story in Java | Comments[22]
BOF-5911: Building a Web 3.0 Address Book
To give everyone a chance to try out the So(m)mer Address Book, I have made it available via Java Web Start: just click on the picture to the right, and try it out.
The Address Book is currently demoware: it shows how one can build virally an open distributed social network client that solves the social network data silo problem (video). No need to have an account on every social networking site on which you have friends, and so maintain your data on each one. You can simply belong to one network and link to all your friends wherever they are. With one click of a button you can publish your social network to your own web server, using ftp, scp, WebDAV, or even Atom. You can then link to other people who have (or not in fact), a foaf file. By pressing the space bar when selecting a friend, the Address Book with then GET their file. So you can browse your social network.
To get going you can explore my social network by dragging my foaf file icon
onto the first pane of the application.
In BOF-5911 which I will be presenting on Thursday at 7:30pm I will be presenting the social networking problem, demonstrating how the So(m)mer Address Book solves it, and showing in detail how it is build, what the problems are, and what work remains. I will also discuss how this can be used to create global single sign on based on a network of trust.
Update
An improved version of the presentation I gave is now available online with audio as Building Secure, Open and Distributed Social Network Applications
Posted at 12:50AM May 07, 2008 [permalink/trackback] by Henry Story in Java | Comments[5]
RDFAuth: sketch of a buzzword compliant authentication protocol
Here is a proposal for an authentication scheme that is even simpler than OpenId ( see sequence diagram ), more secure, more RESTful, with fewer points of failure and fewer points of control, that is needed in order to make Open Distributed Social Networks with privacy controls possible.
Update
The following sketch led to the even simpler protocol described in Foaf and SSL creating a global decentralized authentication protocol. It is very close to what is proposed here but builds very closely on SSL, so as to reduce what is new down to nearly nothing.Background
Ok, so now I have your attention, I would like to first mention that I am a great fan of OpenId. I have blogged about it numerous times and enthusiastically in this space. I came across the idea I will develop below, not because I thought OpenId needed improving, but because I have chosen to follow some very strict architectural guidelines: it had to satisfy RESTful, Resource oriented hyperdata constraints. With the Beatnik Address Book I have proven - to myself at least - that the creation of an Open Distributed Social Network (a hot topic at the moment, see the Economist's recent article on Online social network) is feasible and easy to do. What was missing is a way for people to keep some privacy, clearly a big selling point for the large Social Network Providers such as Facebook. So I went on the search of a solution to create a Open Distributed Social Network with privacy controls. And initially I had thought of using OpenId.
OpenId Limitations
But OpenId has a few problems:
- First it is really designed to work with the limitations of current web browsers. It is partly because of this that there is a lot of hopping around from the service to the Identity Provider with HTTP redirects. As the Tabulator, Knowee or Beatnik.
- Parts of OpenId 2, and especially the Attribute Exchange spec really don't feel very RESTful. There is a method for PUTing new property values in a database and a way to remove them that does not use either the HTTP PUT method or the DELETE method.
- The OpenId Attribute Exchange is nice but not very flexible. It can keep some basic information about a person, but it does not make use of hyperdata. And the way it is set up, it would only be able to do so with great difficulty. A RESTfully published foaf file can give the same information, is a lot more flexible and extensible, whilst also making use of Linked Data, and as it happens also solves the Social Network Data Silo problems. Just that!
- OpenId requires an Identity Server. There are a couple of problems with this:
- This server provides a Dynamic service but not a RESTful one. Ie. the representations sent back and forth to it, cannot be cached.
- The service is a control point. Anyone owning such a service will know which sites you authenticate onto. True, you can set up your own service, but that is clearly not what is happening. The big players are offering their customers OpenIds tied to particular authentication servers, and that is what most people will accept.
RDFAuth, a sketch
So following my strict architectural guidelines, I came across what I am just calling RDFAuth, but like everything else here this is a sketch and open to change. I am not a security specialist nor an HTTP specialist. I am like someone who comes to an architect in order to build a house on some land he has, with some sketch of what he would like the house to look like, some ideas of what functionality he needs and what the price he is willing to pay is. What I want here is something very simple, that can be made to work with a few perl scripts.
Let me first present the actors and the resources they wish to act upon.
- Romeo has a Semantic Web Address Book, his User Agent (UA). He is looking for the whereabouts of Juliette.
- Juliette has a URL identifier ( as I do ) which returns a public foaf representation and links to a protected resource.
- The protected resource contains information she only wants some people to know, in this instance Romeo. It contains information as to her current whereabouts.
- Romeo also has a public foaf file. He may have a protected one too, but it does not make an entrance in this scene of the play. His public foaf file links to a public PGP key. I described how that is done in Cryptographic Web of Trust.
- Romeo's Public key is RESTfully stored on a server somewhere, accessible by URL.
So Romeo wants to find out where Juliette is, but Juliette only wants to reveal this to Romeo. Juliette has told her server to only allow Romeo, identified by his URL, to view the site. She could have also have had a more open policy, allowing any of her or Romeo's friends to have access to this site, as specified by their foaf file. The server could then crawl their respective foaf files at regular intervals to see if it needed to add anyone to the list of people having access to the site. This is what the DIG group did in conjunction with OpenId. Juliette could also have a policy that decides Just In Time, as the person presents herself, whether or not to grant them access. She could use the information in that person's foaf file and relating it to some trust metric to make her decision. How Juliette specifies who gets access to the protected resource here is not part of this protocol. This is completely up to Juliette and the policies she chooses her agent to follow.
So here is the sketch of the sequence of requests and responses.
- First Romeo's user Agent knows that Juliette's foaf name is
http://juliette.org/#julietteso it sends an HTTP GET request to Juliette's foaf file located of course athttp://juliette.org/
The server responds with a public foaf file containing a link to the protected resource perhaps with the N3<> rdfs:seeAlso <protected/juliette> .
Perhaps this could also contain some relations describing that resource as protected, which groups may access it, etc... but that is not necessary. - Romeo's User Agent then decides it wants to check out
protected/juliette. It sends a GET request to that resource but this time receives a variation of the Basic Authentication Scheme, perhaps something like:HTTP/1.0 401 UNAUTHORIZED Server: Knowee/0.4 Date: Sat, 1 Apr 2008 10:18:15 GMT WWW-Authenticate: RdfAuth realm="http://juliette.org/protected/*" nonce="ILoveYouToo"
The idea is that Juliette's server returns a nonce (in order to avoid replay attacks), and a realm over which this protection will be valid. But I am really making this up here. Better ideas are welcome. - Romeo's web agent then encrypts some string (the realm?) and the nonce with Romeo's private key. Only an agent trusted by Romeo can do this.
- The User Agent then sends a new GET request with the encrypted string, and his identifier, perhaps something like this
GET /protected/juliette HTTP/1.0 Host: juliette.org Authorization: RdfAuth id="http://romeo.name/#romeo" key="THE_REALM_AND_NONCE_ENCRYPTED" Content-Type: application/rdf+xml, text/rdf+n3
Since we need an identifier, why not just use Romeos' foaf name? It happens to also point to his foaf file. All the better. - Because Juliette's web server can then use Romeo's foaf name to GET his public foaf file, which contains a link to his public key, as explained in "Cryptographic Web of Trust".
- Juliette's web server can then query the returned representation, perhaps meshed with some other information in its database, with something equivalent to the following SPARQL query
PREFIX wot: <http://xmlns.com/wot/0.1/> SELECT ?pgp WHERE { [] wot:identity <http://romeo.name/#romeo>; wot:pubkeyAddress ?pgp . }The nice thing about working at the semantic layer, is that it decouples the spec a lot from the representation returned. Of course as usage grows those representations that are understood by the most servers will create a de facto convention. Intially I suggest using RDF/XML of course. But it could just as well be N3, RDFa, perhaps even some microformat dialect, or even some GRDDLable XML, as the POWDER working group is proposing to do. - Having found the URL of the PGP key, Juliette's server, can GET it - and as with much else in this protocol cache it for future use.
- Having the PGP key, Juliette's server can now decrypt the encrypted string sent to her by Romeo's User Agent. If the decrypted string matches the expected string, Juliette will know that the User Agent has access to Romeo's private key. So she decides this is enough to trust it.
- As a result Juliette's server returns the protected representation.
Advantages
It should be clear from the sketch what the numerous advantages of this system are over OpenId. (I can't speak of other authentication services as I am not a security expert).
- The User Agent has no redirects to follow. In the above example it needs to request one resource
http://juliette.org/twice (2 and 4) but that may only be necessary the first time it accesses this resource. The second time the UA can immediately jump to step 3. [but see problem with replay attacks raised in the comments by Ed Davies, and my reply] Furthermore it may be possible - this is a question to HTTP specialists - to merge step 1 and 2. Would it be possible for a request 1. to return a 20x code with the public representation, plus a WWWAuthenticate header, suggesting that the UA can get a more detailed representation of the same resource if authenticated? In any case the redirect rigmarole of OpenId, which is really there to overcome the limitations of current web browsers, in not needed. - There is no need for an Attribute Exchange type service. Foaf deals with that in a clear and extensible RESTful manner. This simplifies the spec dramatically.
- There is no need for an identity server, so one less point of failure, and one less point of control in the system. The public key plays that role in a clean and simple manner
- The whole protocol is RESTful. This means that all representations can be cached, meaning that steps 5 and 7 need only occur once per individual.
- As RDF is built for extensibility, and we are being architecturally very clean, the system should be able to grow cleanly.
Contributions
I have been quietly exploring these ideas on the foaf and semantic web mailing lists, where I received a lot of excellent suggestions and feedback.
- In January I asked on foaf dev list how one should cut up a foaf file in order to be able to protect parts of the information, in a thread entitled "for more information please log in". This lead to an initial proposal by Dave Brondsema which I summarized on Jan 18.
- This week I started out the conversation again, and extended it to the semantic web mailing list to get some wider interest with a thread entitled "privacy and open data".
- The above thread led me to sketch out more clearly the functioning of this protocol, with a post entitled "RDFAuth: an initial sketch", that developed into a very useful thread. I tried to take account some of the suggestions put forward there in writing this post. Others suggestions, such as the idea by Renato Gollin to work into this a three way challenge response are very interesting and should be looked into, but are way over my head.
- I had a very useful discussion with Benjamin Nowack (a.k.a. bengee) on #swig where he pointed me to some initial work he had done on the same subject. He had sketched this out on the swig wiki and called it RDFAuth. Since this was clearly going in the same direction I took us to be working on the same project. The next day I found that we may have slightly different views on how this should go. Bengee seems to think we need a token server. I hope we really don't. The big advantage of using Public Key cryptography is that it massively simplifies the protocol. I still think I can convince him :-), so I have kept the name.
- Toby Inkster, suggested a way to link this in with HTTPS which would be fabulous. I missed the post, and he reminded me by summarising it here. Not being an https expert (yet) I can't comment. I have been reading up on this and it does seem to be an even better solution. See the thread on the HTTP-WG mailing list. It is really a brilliant idea. I am working on this and will post an update as soon as I have something working.
- Peter Williams suggested looking at RFC 2617: on Basic and Digest Authentication and the less successful RFC 2693 SPKI Certificate Theory.
Finally
So I suppose I am now looking for feedback from a wider community. PGP experts, security experts, REST and HTTP experts, semantic web and linked data experts, only you can help this get somewhere. I will never have the time to learn these fields in enough detail by myself. In any case all this is absolutely obviously simple, and so completely unpatentable :-)
Thanks for taking the time to read this
