The Sun BabelFish Blog

Don't panic !

Friday Mar 28, 2008

RDFAuth: sketch of a buzzword compliant authentication protocol

Here is a proposal for an authentication scheme that is even simpler than OpenId ( see sequence diagram ), more secure, more RESTful, with fewer points of failure and fewer points of control, that is needed in order to make Open Distributed Social Networks with privacy controls possible.

Update

The following sketch led to the even simpler protocol described in Foaf and SSL creating a global decentralized authentication protocol. It is very close to what is proposed here but builds very closely on SSL, so as to reduce what is new down to nearly nothing.

Background

Ok, so now I have your attention, I would like to first mention that I am a great fan of OpenId. I have blogged about it numerous times and enthusiastically in this space. I came across the idea I will develop below, not because I thought OpenId needed improving, but because I have chosen to follow some very strict architectural guidelines: it had to satisfy RESTful, Resource oriented hyperdata constraints. With the Beatnik Address Book I have proven - to myself at least - that the creation of an Open Distributed Social Network (a hot topic at the moment, see the Economist's recent article on Online social network) is feasible and easy to do. What was missing is a way for people to keep some privacy, clearly a big selling point for the large Social Network Providers such as Facebook. So I went on the search of a solution to create a Open Distributed Social Network with privacy controls. And initially I had thought of using OpenId.

OpenId Limitations

But OpenId has a few problems:

  • First it is really designed to work with the limitations of current web browsers. It is partly because of this that there is a lot of hopping around from the service to the Identity Provider with HTTP redirects. As the Tabulator, Knowee or Beatnik.
  • Parts of OpenId 2, and especially the Attribute Exchange spec really don't feel very RESTful. There is a method for PUTing new property values in a database and a way to remove them that does not use either the HTTP PUT method or the DELETE method.
  • The OpenId Attribute Exchange is nice but not very flexible. It can keep some basic information about a person, but it does not make use of hyperdata. And the way it is set up, it would only be able to do so with great difficulty. A RESTfully published foaf file can give the same information, is a lot more flexible and extensible, whilst also making use of Linked Data, and as it happens also solves the Social Network Data Silo problems. Just that!
  • OpenId requires an Identity Server. There are a couple of problems with this:
    • This server provides a Dynamic service but not a RESTful one. Ie. the representations sent back and forth to it, cannot be cached.
    • The service is a control point. Anyone owning such a service will know which sites you authenticate onto. True, you can set up your own service, but that is clearly not what is happening. The big players are offering their customers OpenIds tied to particular authentication servers, and that is what most people will accept.
As I found out by developing what I am here calling RDFAuth, for want of a better name, none of these restrictions are necessary.

RDFAuth, a sketch

So following my strict architectural guidelines, I came across what I am just calling RDFAuth, but like everything else here this is a sketch and open to change. I am not a security specialist nor an HTTP specialist. I am like someone who comes to an architect in order to build a house on some land he has, with some sketch of what he would like the house to look like, some ideas of what functionality he needs and what the price he is willing to pay is. What I want here is something very simple, that can be made to work with a few perl scripts.

Let me first present the actors and the resources they wish to act upon.

  • Romeo has a Semantic Web Address Book, his User Agent (UA). He is looking for the whereabouts of Juliette.
  • Juliette has a URL identifier ( as I do ) which returns a public foaf representation and links to a protected resource.
  • The protected resource contains information she only wants some people to know, in this instance Romeo. It contains information as to her current whereabouts.
  • Romeo also has a public foaf file. He may have a protected one too, but it does not make an entrance in this scene of the play. His public foaf file links to a public PGP key. I described how that is done in Cryptographic Web of Trust.
  • Romeo's Public key is RESTfully stored on a server somewhere, accessible by URL.

So Romeo wants to find out where Juliette is, but Juliette only wants to reveal this to Romeo. Juliette has told her server to only allow Romeo, identified by his URL, to view the site. She could have also have had a more open policy, allowing any of her or Romeo's friends to have access to this site, as specified by their foaf file. The server could then crawl their respective foaf files at regular intervals to see if it needed to add anyone to the list of people having access to the site. This is what the DIG group did in conjunction with OpenId. Juliette could also have a policy that decides Just In Time, as the person presents herself, whether or not to grant them access. She could use the information in that person's foaf file and relating it to some trust metric to make her decision. How Juliette specifies who gets access to the protected resource here is not part of this protocol. This is completely up to Juliette and the policies she chooses her agent to follow.

So here is the sketch of the sequence of requests and responses.

  1. First Romeo's user Agent knows that Juliette's foaf name is http://juliette.org/#juliette so it sends an HTTP GET request to Juliette's foaf file located of course at http://juliette.org/
    The server responds with a public foaf file containing a link to the protected resource perhaps with the N3
      <> rdfs:seeAlso <protected/juliette> .
    
    Perhaps this could also contain some relations describing that resource as protected, which groups may access it, etc... but that is not necessary.
  2. Romeo's User Agent then decides it wants to check out protected/juliette. It sends a GET request to that resource but this time receives a variation of the Basic Authentication Scheme, perhaps something like:
    HTTP/1.0 401 UNAUTHORIZED
    Server: Knowee/0.4
    Date: Sat, 1 Apr 2008 10:18:15 GMT
    WWW-Authenticate: RdfAuth realm="http://juliette.org/protected/*" nonce="ILoveYouToo"
    
    The idea is that Juliette's server returns a nonce (in order to avoid replay attacks), and a realm over which this protection will be valid. But I am really making this up here. Better ideas are welcome.
  3. Romeo's web agent then encrypts some string (the realm?) and the nonce with Romeo's private key. Only an agent trusted by Romeo can do this.
  4. The User Agent then sends a new GET request with the encrypted string, and his identifier, perhaps something like this
    GET /protected/juliette HTTP/1.0
    Host: juliette.org
    Authorization: RdfAuth id="http://romeo.name/#romeo" key="THE_REALM_AND_NONCE_ENCRYPTED"
    Content-Type: application/rdf+xml, text/rdf+n3
    
    Since we need an identifier, why not just use Romeos' foaf name? It happens to also point to his foaf file. All the better.
  5. Because Juliette's web server can then use Romeo's foaf name to GET his public foaf file, which contains a link to his public key, as explained in "Cryptographic Web of Trust".
  6. Juliette's web server can then query the returned representation, perhaps meshed with some other information in its database, with something equivalent to the following SPARQL query
    PREFIX wot: <http://xmlns.com/wot/0.1/>
    SELECT ?pgp
    WHERE {
         [] wot:identity <http://romeo.name/#romeo>;
            wot:pubkeyAddress ?pgp .
    } 
    
    The nice thing about working at the semantic layer, is that it decouples the spec a lot from the representation returned. Of course as usage grows those representations that are understood by the most servers will create a de facto convention. Intially I suggest using RDF/XML of course. But it could just as well be N3, RDFa, perhaps even some microformat dialect, or even some GRDDLable XML, as the POWDER working group is proposing to do.
  7. Having found the URL of the PGP key, Juliette's server, can GET it - and as with much else in this protocol cache it for future use.
  8. Having the PGP key, Juliette's server can now decrypt the encrypted string sent to her by Romeo's User Agent. If the decrypted string matches the expected string, Juliette will know that the User Agent has access to Romeo's private key. So she decides this is enough to trust it.
  9. As a result Juliette's server returns the protected representation.
Now Romeo's User Agent knows where Juliette is, displays it, and Romeo rushes off to see her.

Advantages

It should be clear from the sketch what the numerous advantages of this system are over OpenId. (I can't speak of other authentication services as I am not a security expert).

  • The User Agent has no redirects to follow. In the above example it needs to request one resource http://juliette.org/ twice (2 and 4) but that may only be necessary the first time it accesses this resource. The second time the UA can immediately jump to step 3. [but see problem with replay attacks raised in the comments by Ed Davies, and my reply] Furthermore it may be possible - this is a question to HTTP specialists - to merge step 1 and 2. Would it be possible for a request 1. to return a 20x code with the public representation, plus a WWWAuthenticate header, suggesting that the UA can get a more detailed representation of the same resource if authenticated? In any case the redirect rigmarole of OpenId, which is really there to overcome the limitations of current web browsers, in not needed.
  • There is no need for an Attribute Exchange type service. Foaf deals with that in a clear and extensible RESTful manner. This simplifies the spec dramatically.
  • There is no need for an identity server, so one less point of failure, and one less point of control in the system. The public key plays that role in a clean and simple manner
  • The whole protocol is RESTful. This means that all representations can be cached, meaning that steps 5 and 7 need only occur once per individual.
  • As RDF is built for extensibility, and we are being architecturally very clean, the system should be able to grow cleanly.

Contributions

I have been quietly exploring these ideas on the foaf and semantic web mailing lists, where I received a lot of excellent suggestions and feedback.

Finally

So I suppose I am now looking for feedback from a wider community. PGP experts, security experts, REST and HTTP experts, semantic web and linked data experts, only you can help this get somewhere. I will never have the time to learn these fields in enough detail by myself. In any case all this is absolutely obviously simple, and so completely unpatentable :-)

Thanks for taking the time to read this

Friday Feb 15, 2008

Proof: Data Portability requires Linked Data

Data Portability requires Linked Data. To show this let me take a concrete and topical example that is the core use case of the Data Portability movement: Jane wants to move her account from social network A to social network B. And she wants to do this in a way that entails the minimal loss of information.

Let us suppose Jane wants to make a rich copy, and that she wants to do this without hyperdata. Ideally she would like to have exactly the same information in the new space as she had in the old space. So if Jane had a network of friends in social network A she would like to have the same network of friends in B. But this implies moving all the information about all her friends from A to B, including their social network too. For after all the great thing about one's friends is how they can help us make new friends. But then would one not want to move all the social network of one's friends too? Where does it stop? As William Blake said so well in Auguries of Innocence

        To see a world in a grain of sand,
	And a heaven in a wild flower,
	Hold infinity in the palm of your hand,
	And eternity in an hour.
the problem is that everything is linked in some way, and so it is impossible to move one thing and all its relations from one place to another using just copy by value, without moving everything. A full and rich copy is therefore impossible.

So what about pragmatically limiting ourselves to some subset of the information? We have to reduce our ambitions. So let us limit the data Jane can move to just her personal data and closest social network. So she copies some subset of the information about her friends over to network B. Nice, but who is going to keep that information up to date? When Jane's friend Jack moves house, how is Jane going to know about this in her new social network? Would Jack not have to keep his information on social Network B up to date too? And now if every one of Jack's 1000 friends moves to a different social network, won't he have to now keep 1000 identities up to date on each of those networks? Making it easy for Jane to move social network is going to make life hell for Jack it seems. Well of course not: Jack is never going to keep the information about himself up to date on these other social networks, however limited it is going to be. And so if Jane moves social network she is going to have to leave her friends behind.

The solution of course is not to try to copy the information about one's friends from one social network to another, but rather to move one's own information over and then link back to one's friends in their preferred social network. By linking by reference to one's friends identity one reduces to a minimum the information that needs to be ported whilst maintaining all the relationships that existed previously. Thus one can move one's identity without loss.

The rest follows nearly immediately from these observations. Since the only way to refer to resources in a global namespace is via URIs ( and the most practical way currently is to do this with URLs ), URI's will play the role of pointers in our space. This is the key architectural decision of the semantic web. So by giving people URLs as names we can point to our friends wherever they are, and even move our data without loss. All we need to do when we move our foaf file is to have the web server serve up a HTTP redirect message at the old URL, and all links to our old file will be redirected to our new home.

Notes

Tuesday Jan 15, 2008

Data Portability: The Video

Here is an excellent video to explain the problem faced by Web 2.0 companies and what Data Portability means. It is amazing how a good video can express something so much more powerfully, so much more directly than words can. Sit back and watch.


DataPortability - Connect, Control, Share, Remix from Smashcut Media on Vimeo.

Feeling better? You are gripped by the problem? Good. You should now find that my previous years posts start making a lot more sense :-)

Will the Data Portability group get the best solution together? I don't know. The problem with the name they have chosen is that it is so general, one wonders whether XML is not the solution to their problem. Won't XML make data portability possible, if everyone agrees on what they want to port? Of course getting that agreement on all the topics in the world is a never ending process.... Had they retained the name of the original group this stemmed from, Social Network Portability then one could see how to tackle this particular issue. And this particular issue seems to be the one this video is looking at.

But the question is also whether portability is the right issue. Well in some ways it is. Currently each web site has information locked up in html formats, in natural language (or even sometimes in jpegs (see the previous story of Scoble and Facebook), in order to make it difficult to export the data, which each service wants to hold onto as if it was theirs to own.

Another way of looking at this is that the Data Portability group cannot so much be about technology as policy. The general questions it has to address are question of who should see what data, who should be able to copy that data, and what they should be able to do with it. This does indeed involve identity technology insofar as all of the above questions turn around questions of identity ("who?"). Now if every site requires one to create a new identity in order to access one's data one has the nightmare scenario depicted in the video, where one has to maintain one's identity across innumerable sites. As a result the policy issue of Data Portability does require one to solve the technical problem of distributed identity: how can people maintain the minimum number of identities on the web? (ie not one per site) Another issue that follows right upon the first is that if one wants information to only be visible to a select group of people - the "who sees what" part of the question - then one also needs a distributed way to be able to specify group membership, be it friendship based or other. The video again makes that point very clearly why having to recreate one's social network on every site is impractical.

What may be misleading about the term Data Portability is that it may lead one to think that what one wants is to copy one's social information from one social service to another. That would just automate the job of what the video illustrates people having to do by hand currently. But that is not a satisfactory solution. Because one cannot extract a graph of information from one space to another without loss. If I extract my friends from LinkedIn into FaceBook, it is quite certain that Facebook will not recognise a large number of the people I know on LinkedIn. Furthermore the ported information on FaceBook would soon be out of date, as people updated their network and profiles on LinkedIn. Unless of course Facebook were able to make a constant copy of the information on LinkedIn. But that's impossible right? Wrong! That is the difference between copy by value and copy by reference. If FaceBook can refer to people on LinkedIn, then the data will always be as up to date as it can be. So this is how one moves from DataPortability to Linked Data, also known as hyper data.

Sunday Jan 06, 2008

2008: The Rise of Linked Data

Here is my one prediction for 2008. Social Networking's breakdown will lead to the rise of Linked Data. Here is the logic:

  1. Social Networking sites have grown tremendously over the last few years fuelled by huge profits from advertising dollars. When I worked at AltaVista it was well known that the more you knew about your users the more valuable an ad became. If you know all the friends, interests, habits of someone, and you know what they are doing right now, you can suggest exactly the right product at the right time to them. The cost of a simple add on AltaVista was $5 per thousand page views. If you knew a lot about what someone was looking for the value could go up to $50.
  2. The allure of profit is leading to an ever increasing number of players in this space. See the Social Networking 3.0 talk at Stanford earlier in 2007.
  3. This in turn leads to a fracturing of the Social Networking space. As more players enter the space, each ends up with a smaller and partial view of the whole graph or social relations.
  4. Which is leading to the need for Social Network Portability, and more generally Data Portability. Users such as Scoble want to use their data on their own computer and link it together. Social Network Providers such as Plaxo or Facebook have a financial interest in helping their users move with their social network to their service. Facebook helps users extract all the information from GMail. Plaxo wants to help users extract all the information from every other social network.
  5. Privacy concerns will mount tremendously as a result. Each social network will increase in their users the fear of giving their data over to other "spamming" services, to defend their position. But to do this they will make it more and more difficult to extract the data from their service, annoying and so going against their users desires for linking their information. This will seem more and more like an issue for anti trust involvement as the ire of more and more people mount.

The force of the above logic will release the energy needed for an investment in Linked Data tools such as Beatnik, since it solves all the problems mentioned above - at the expense of killing the dream some investors may have had of a world where they own Nineteen Eighty Four like, the world.

Data Portability: Scoble Right or Wrong and beyond

Scoble explains Video

In this video Scoble explains how he got thrown off Facebook.

Here is a short summary, but the video is well worth watching as the emotions come through much better...
Facebook, which asks its users for their Gmail password in order to extract all the contacts someone has from their mail history and build up a possible list of friends, Facebook which scans the web for information to suggests friendships you may have, that same Facebook does not want anyone, including YOU, to be able to extract the data in your account on their web site even were it only into your own electronic address book. To do this they encode all email addresses as images which make it very difficult for a computer to decode, and so makes it tedious to move and use that information. So when Scoble tried to extract his 5000 friends using Optical Character Recognition - an idea suggested by Plaxo which wants to be a hub of people information - , Facebook noticed this and cut off his account. (I think he may have been reinstated now - but whether there is a point in belonging to such a service is a serious question now).As a result Scoble and other have asked people to join the conversation on the Data Portability group.

This clearly is a very important issue. But his solution to the issue was not the best one. By using Plaxo - which wants to be the social graph hub of the web - to extract his data, he would have been able to do what clearly he should be able to do, namely add his contact information easily to Outlook. But he did this at the cost of allowing a third entity to gather a lot of information about him and his contacts. CNET's The Scoble scuffle: Facebook, Plaxo at odds over data portability, touches on the issue. Allowing a third service provider to extract all your data in order to give you access to it, is not improving your freedom. It is just giving another commercial entity access to a huge network of information about you. And the more a company knows about its users the more valuable the advertising its sells becomes. There is no mystery here as to why Social Networking sites have had so much money pumped into them over the last few years. So you have jumped out of the frying pan right into the fire here. Clearly if you are concerned with security of your information - with Facebook you had one commercial entity that had a lot more information about you than it should - now you have two.

Really what you want is the following:

  1. Selectivity in who gets what information about you:
    • Strangers should be able to see the minimum information I want to make public.
    • acquaintances should see more
    • family should see other information
    • ... these policies should be flexible and determinable by the owner of the information, by the person making the speech act of affirming it.
    And even though I may be happy for a service provider to maintain this data, you may not even wish to allow them access to it. It should be possible to have this information on your server at home controlled only by you.
  2. Link to friends wherever they are. After all if you have to go through one central aggregator of relationship information, then that aggregator will have a view of all the relationship information available, giving one actor complete and overwhelming advantage as opposed to everyone else. You need distributed data, also known as linked data or hyperdata.
  3. An Open Data structure so as to allow ecosystems to grow and use that information. I want the tools on my computer to all be able to work with my social network information.
  4. A way to determine trust

Allowing different people to see more or less information (point 1 above) should be quite easy to set up by having the server return different representation depending on who is viewing the information, determined by their having logged in to your site with something like OpenId. Linking information in a distributed way is easy using Semantic Web technologies, and is demonstrated by tools such as Beatnik. Beatnik is just one of the tools that could use such information on my desktop (thereby fulfilling point 3 above).

What you say, out loudly or on your web site is a speech act. All information is the speech act of some one, and it is this that allows us to determine our level of trust it in. This is also why one should try to say less rather than more, since every piece of information one publishes is information one may have to defend. It is therefore much better if we have a system where everyone can look after a small part of the graph of information they have a responsibility for and defend it. They can then point to information maintained by other people, who will have to defend their piece. But since pointing to information maintained by others is a vote of confidence in them, an economy of links will emerge whereby people want to increase the number of quality links to them, which will only happen if they are deemed trustworthy. So the system allows for distributed trust. For a simple but excellent example see the Distributed Information Group wiki's policy for allowing people to post.

Thursday Jan 03, 2008

Scoble gets thrown off Facebook

picture of current version of Beatnik

Scoble, who became very famous for getting blogging started at Microsoft, got ejected from FaceBook for crawling his network of friends. This is the problem with closed social networks and data silos in general. He seems to think the solution is data portability. More than that: the solution is Open Social Networks. You should be able to use a simple web server and just link up to your friends friend of a friend (foaf) file, whichever service they are using be it their own machine located in their basement, a service provider, a government owned machine, ... . Just as I can link from this blog to any blog. This would allow people to own their piece of the network, like they can own their blogs.

This is what Beatnik, a friend of a friend browser, which I described in this email to the social network portability group, will make it easy for anyone to do.

Everyone is welcome to help on this open source project: artists, documenters, Swing experts, testers, RESTafarians, ...

Saturday Dec 15, 2007

James Gosling has a foaf name

And so does Tim Bray, Greg Papadopoulos, Jonathan Schwartz, Sun Microsystems, and Java. All thanks to the great work of the DBPedia people, a loose network of highly skilled distributed self selected avant garde force de frappe, who are extracting all the metadata possible from Wikipedia and making it available as hyperdata, ready to be linked to. :-)

You can browse their information on the web, or with the Tabulator generic data browser which will merge information it finds into one large graph as you explore it. As a result of this I can now add Tim Bray and James Gosling to my foaf file (foaf icon), by adding the following N3 statements:

:me foaf:knows [ = <http://dbpedia.org/resource/James_Gosling>;
                    a foaf:Person;
                    foaf:name "James Gosling" ],
               [ = <http://dbpedia.org/resource/Tim_Bray>;
                    a foaf:Person;
                    foaf:name "Tim Bray" ] .

It is worth looking at how DBPedia works. http://dbpedia.org/resource/James_Gosling is now a Universal Resource Identifier for James Gosling. You cannot fetch James because he is not an information resource, ie, he is not a document, though he is very resourceful, and full of interesting information. You can tell that James is not an information resource because you can't copy him easily. So when you do an HTTP GET on that URI you get the following:

hjs@bblfish:0$ curl -I http://dbpedia.org/resource/James_Gosling
HTTP/1.1 303 See Other
Date: Sat, 15 Dec 2007 17:57:54 GMT
Server: Apache-Coyote/1.1
Vary: Accept,User-Agent
Location: http://dbpedia.org/page/James_Gosling
Content-Type: text/plain
Content-Length: 90

ie you get a redirect to the page about James Gosling. This is because curl by default asks for the html representation of resources. Had you sepecified that you wanted the machine readable rdf/xml representation you would get a redirect to another resource:

hjs@bblfish:0$ curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/James_Gosling
HTTP/1.1 303 See Other
Date: Sat, 15 Dec 2007 18:01:10 GMT
Server: Apache-Coyote/1.1
Vary: Accept,User-Agent
Location: http://dbpedia.openlinksw.com:8890/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FJames_Gosling%3E
Content-Type: text/plain
Content-Length: 210

Here you get a redirect to a SPARQL query to DESCRIBE James Gosling. To get the full content, in N3 try:

hjs@bblfish:0$ curl -L -H "Accept: text/rdf+n3" http://dbpedia.org/resource/James_Gosling 

the -L flag follows all the redirects...

Friday Dec 07, 2007

Yochai Benkler: The Wealth of Networks

This afternoon I attended a teleconference at the University of Sao Paulo where Yochai Benkler talked from the Berkman Center for Internet and Society at Harvard, about his now famous book "The Wealth of Networks" (available online) and answered questions from the audience. Yochai talked about the impact of open source and peer to peer modes of co-operative production on economics, politics, arts and education. The book has many excellent and illuminating examples on how massively parallel and distributed use of human resources can outperform large centrally organised tayloristics production methods. He does point out that this won't work in every field of endeavour, but more naturally in knowledge based ones, where the cost of reproduction is close to zero. More details in the freely available book.

The conference was organised by Imre Simon from the Institute of Advanced Studies of the University of Sao Paulo. A web site in portuguese is dedicated to this talk, and it was broadcast live on the web.

At the end of the talk, as the last question from the floor, I asked about what research had been done into applying Metcalf's law to networks as powerful as the Semantic Web, and so how this would affect questions on the wealth of networks. Yochai seemed to think that the Semantic Web was too much about data, and not about people. Of course Beatnik, the semantic address book I am working on right now, is going to show how this dichotomy is completely illusory, and how the distributed, decentralised world of hyperdata should fit perfectly into the central thesis of the book. :-)

Friday Nov 02, 2007

Vote for Java6 on Leopard!

As mentioned previously a lot of Java developers on OSX are upset at Apple's silence as to its intentions with respect to the release of Java 6. There used to be a developer preview available, which was pulled recently with no indication as to when a replacement would be available. People like me who upgraded in the hope of having the latest and greatest - which we have been very patiently waiting for over a year for - are very disappointed. It creates all kinds of annoyances, like not being able to run Java Tutorial examples. Some who are working on Java 6 projects cannot use their computer easily, without resorting to installation of a separate OS in a virtual machine, to do their job. We all like OSX: its a beautiful easy to use Unix that usually really helps us get our work done. I have been very happily using it since 2004.

The first solution of course is to have our voice heard. One way to do this is to file a bug with Apple. Please do this! The only problem I have with it is that as opposed to the Java bug database which is completely open, the Apple bug database is completely closed. So there's no real way of verifying how many people have posted a report. We must therefore complement that action with an equal Open action. Following the noble example given to us by Nova Spivack, when he asked for people to make their voice heard in support of the Burmese people and got some real results, let us do the same to help Apple make the right decision.
Anybody who would like to support this issue in the blogosphere, should help post a blog with the string

13949712720901ForOSX

The first part of the string is the decimal notation for 0xCAFEBABE [1], the magic cookie for JavaClass files (thanks David for the number and the pointer to Fredericiana's photo). Then post similar instructions on your blog or point people here. Let's see how far this gets us! [2]

We should then be able to use any search engine, Google is a good choice, to search for this string [3], and hopefully motivate the managers at Apple to invest more time on Java and be more open about their plans with the community.

Your vote may also be an energizer to those groups that are starting to port the OpenJDK to OSX (via the mac java community).

Notes

  1. Oops I just noticed a mistake here. 13949712720901 in dec = 0xCAFEBABE405 in Hex. Even better. So that's CAFEBABE + the HTTP 405 Response, which means "Method not available". :-)
  2. If you know a foreign language then please translate the instructions and explanations so that more people can understand what is going on. Always post a link to some instructions. Language is a Virus, but it is most virulent when it is understandable and hyperlinked, of course.
  3. A search on Google Web returns more results - more than AllTheWeb or AltaVista - but Google Blog Search contains less duplicates. The real number of votes is somewhere between those two numbers, as some people are voting on their open source web sites, which are not always feed enabled. Simon is keeping count.
  4. Karussell is keeping a list of related articles.

Update

Tuesday Nov 13: Landon Fuller has been able to get a very nice hello world GUI app running on OSX using the FreeBSD jdk1.6 port. It runs under X Windows only. Excellent work!

Nov 20th, 2007: Dave Dirbin publishes the first beta release of the open source java 6. This campaign has gathered 105 blog votes if we count the results from Google Blog Search, placing it easily among the top 10 bug reports at the Java Bug database. The Google web search returns 256 results, which will contain the blog search, many duplicate pages pointing to blogs + some extra votes people may have placed on the web. I guess that those extra votes may pop this bug report up to the top 5 position.

Wednesday Dec 19: Apple has put a developer preview of Java 6 up on Apple Developer Connection. It is nice to see things progress on that side. As a result of this conflict, Java development on OSX has become a lot richer, with an open source JDK starting to compete with the closed one from Apple. This can only be good for both, and for developer and customer confidence in the platform.

Monday Oct 08, 2007

Open Data Licences

The amount of Open Data is growing fast. The idea that data may need protection in an Open Society is bizarre enough, but in Europe at least a whole set of laws have been put in place for this purpose. For those who wish to add data to the Commons, so that it may better contribute to the value of the network as predicted by Metcalf's law, current Open licences will not do it seems. This is, as I understand, because copyright licenses do not cover data well, since a set of relations can be serialized in any number of ways: order does not matter, it is easy to refactor data, or combine it with other data. (I wonder then why this was not a problem for source code?)

To help resolve these issues, Talis, a Leading Semantic Web company, helped fund research into this area which resulted in the Open Data Licence project, which is now seeking feedback on their proposals. From my quick reading of it this license seems to have a gnu feel to it, but I may be wrong.

Tuesday Aug 28, 2007

My Bloomin' Friends

Closed Social Networks are blossoming all over the place. They provide a semblance of protection, at a price: lock in. Locked into the social network provider you get convenience in the form of tools to make conversation easier (video, email, chat boards, ...), some form of privacy protection (if you trust the provider), introductions to 'like minded' people, and other niceties.

Some of us work in the open air: we have to set standards in public view; we stand by what we say; we accept criticism from wherever it comes; and we can't choose our friends based on their social network provider. We describe ourselves in our foaf files where we can specify what we do, how to contact us, our interests, and links to who we know by pointing to their Universal Identifiers. There is no trouble linking between people who are open in this way. We are happy to reference each other: it strenghtens the exposure of our work and the quality of the web. This is how I link to Paul Gearon:

:me foaf:knows  [ = <http://web.mac.com/thegearons/people/PaulGearon/foaf.rdf#me>; 
                  a foaf:Person;
                  foaf:name "Paul Gearon" ] .
I could just point to his URL, but the little extra duplicate information can make life easier for people/robots browsing the data web. It can help people notice inconcistencies and help me correct them.

But not everyone lives in the open the same way, and not everyone wants to make the same amount of information about themselves public. There are a number of different ways to deal with this. I want to discuss a few of them here.

Content Negotition

How much someone says about themselves is up to them, and so is how they protect their information. The same URL that identifies someone, could return more or less information depending on who is asking. I could set up my foaf file so only friends who log in via openid can see my friends. Others would just get default information about me. I could be even more clever. I could allow any friend of my friend who logs in via their openid to see my full foaf file; others would see information about me, and a select group of open friends. Closed Social Networks could open up by making it convenient to specify these policies, and providing the right infrastructure to do so.

Indirect Identification

By directly identifying someone via a URL (as I do) we can leave a lot of the policy of what they make visible up to them. But those that don't have a foaf name, need to be identified indirectly. We can do that by identifying them via some property such as their blog, their home page, their email address, or their openid. I am very open about my email addresses. They are published and visible to all.
 <http://bblfish.net/people/henry/card#me>     <http://xmlns.com/foaf/0.1/mbox> <mailto:henry.story@bblfish.net> .
I value it more that people can contact me easily - living as I do in the middle of nowhere and often living nowhere in particular - than the pain of spammers. Too many people are lazy about security, using virus filled Windoze computers, obvious passwords, cracked software for me to be under any illusion that hiding my email is going to prevent the bad guys from getting it.

However I can't assume that everyone else will accept me applying this argument to their email address. For this there is a nice mathematical technique: I can encrypt their email address using the SHA1 hash function. This create a close to unique string that cannot be dissasembled. You cannot go from the sha1 sum of an email address back to the email. But you can always calculate the same sha1sum from an email. This is how I identify Simon Phipps, Sun's Open source Officer:

:me foaf:knows [ a foaf:Person;
                 foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                 foaf:name "Simon Phipps";
                 rdfs:seeAlso <http://www.webmink.net/foaf.rdf>
               ].

If you know Simon's email, then you will know that I know him. "What use is that?" I can hear someone ask. It's all about Working with People on the Internet. Imagine you are reading email on a newsgroup with a foaf enabled mail tool linked to a foaf enabled Address Book (such as Beatnik). You come on an email by Simon saying something interesting about how Sun has changed its stock ticker to JAVA for example. My logo and perhaps that of a couple of other people appears on the mail reader in a way that indicates to you that we know Simon. The post is no longer anonymous for you, and so has more trust value. You feel part of a community.[1]

So spammers can not use that information to spam. Either they already know your email address, and so they are probably already spamming you, or they don't, and this won't help them. They can only [2] learn about social network claims: who claims to know who. They could use this, it is true to introduce themselves as an aquaintance of a friend of yours. A bit of a risky strategy that could quickly get them on a black list. Currently being black listed may not be an expensive proposition. But in a cryptographic web of trust this will be both much easier to notice, and more damaging for the infringers.

Fuzzy Identification

I can directly and indirectly identify a lot of people in my Address Book as described above. This is perfectly acceptable for people who have an open life, like I do, and a large portion of the Open Source community, bloggers, standard setters, etc... But on last count I had over 700 people in my AddressBook. It is a lot of work to identify all of them individuall, and to decide how much visibility I should give them. I may not even want people to know how many people I know this way. Also I may want deniability: there are people one may know, but one may not want to highlight that, and one may want to be able to deny that one knows them to some people. The foaf:sha1sum gives me a way to identify someone, but if some nozy person comes to me and asks me about that person's life after having identified the corresponding email address, there is no escape route other than refusing any conversation, which by itself can easily be taken to be significant. What we need is a way to fuzily identify a group of one's aquaintance.

Bloom Filters

This is what Bloom Filters enable one to do. Originally used in times when memory was expensive, they allowed the whole vocabulary of a language to be condensed into a reasonably short string. Here we can use it to group all the email addresses of our friends together in one opaque string. I could express as follows in RDF (bear in mind that the rdf vocabulary has not been settled on):
:me foaf:bloomMbox [ a bloom:Bloom;
                     bloom:base64String """"
            IAOgQgSAAAICCAADAoQgDABAAiQKgIABgyAIBEhAAAAIUKBACCYAABAAaEkGQAGIEAHRUAgAAQUw 
            hCgwACJNQxQAAggAgCIgAAAAKgICEKAAAABCQiB0JCAAAIkgDASAYiAAAEIQAAIAABDCEAZACOpA 
            ICEEMAGAEGEAxIA=""";
                     bloom:hashNum 4;
                     bloom:length 1000 ] .

Given the above Bloom someone can query it with an email address using the inverse algorithm and the Bloom will answer either that I may know that person, or that it can't tell. The loaf project explains some of the advantages of having this in more detail.

The best way to get a feel for how it works is to try it. Here I have written a little java applet [3] that allows you to test my Bloom for people I know, and to create your own bloom [4].

Your browser is completely ignoring the <APPLET> tag! Go to java.com to download the latest.

Some emails you can try with positive results are tbray attextuality dot C O M, or bill at dehora dot net (suitably transformed of course). The applet lowercases all email addresses when creating and when testing the bloom.

To create your own bloom just click the "Create Bloom" tab. An easy way to extract all your email addresses from an OSX Address Book is to run the following on the command line:

hjs@bblfish:0$ osascript -e 'tell application "Address Book" to get the value of every email of every person' | perl -pe 's/,+ /\n/g' | sort | uniq | pbcopy

You should now be able to paste the list of all your contacts in the applet. To restrict the Addresses to on of your groups named "foaf" for example replace the relevant section above with tell application "Address Book" to get the value of every email of every person in foaf.

You will need to choose the number of hashes and the maximal size of the bucket you wish to fill. The greater the number of hashes and the greater the size of the bucket, the more precision you get and the less deniability.[5]

Conclusion

None of the above tools are by themselves the complete solution for creating an Open Social Network that will satisfy everyone. But for people willing to live in the open, the correct and astute use of them should satisfy most of people's requirements. Access Control on URLs can make it possible to reveal more or less information depending on who is looking; indirect identification can allow one to name people even without direct identification; sha1sums allows one to partially hide sensitve identifying information; and Blooms allow one to make fuzzy statements of set membership. All of these can be combined in different ways. So one can make statements about sha1sum identified people on the open web, or one can do so behind an access controlled file that only friends logged in with OpenId can see. There are bound to be more fun things to be discovered here. But this should make clear just how much can be done in this space.

Notes

  1. For the link from email addresses to sha1sums to work, it helps to canonicalise the emails to all lowercase. This should probably be made more explict in the foaf:mbox_sha1sum definition.
  2. "They can 'only' learn about social network claims", is quite a lot more than some people are willing to accept. See the article by Mark Wahl "Organizing principles for identity systems: Attacks on anonymized social networks and fudging oracles" which contains some very good pointers. For people who want to retain complete anonymity, and this is what people subscribe to when they answer public surveys, any leakage of information is too much leakage. The problem is that because of Metcalf's Law it is nearly impossible to stop information combining itself: Information wants to be linked. So I think, when we are not tied to stringent laws, we should accept this rather than fight it, and use it to our advantage when hunting down spammers: the law holds for them too.
  3. You can get the source code for the applet on the so(m)mer repository in the misc/Bloom subdirectory. I used the pt.tumba.spell.BloomFilter class which I adapted a little for my needs. This was just the first one I found out there. It is probably not the most efficient one, as it uses an array of booleans, when it could use an byte array. If you know of other libraries please let me know.
    The code was put together really quickly and may well contain bugs. Feedback and patches and contributions are welcome.
  4. the advantage of Java Applets over server side code is really obvious here:
    • I don't need a server with a fixed port number to show you this
    • someone can't easily start a denial of service attack to bring the server down
    • You email addresses never leave your computer, so there is no fear of loss of privacy.
    On the last point it would be nice if browser vendors made it easier to get info about the exact restrictions a Java Applet had. I would like to be able to click on an Applet and verify or set it to "no network communication whatever". This would increase trust even more in cases like this.
  5. More info on the load site. Apparently one needs more than 1/4 deniability if one is to preserve some measure of privacy, according to the paper "the price of privacy and the limits of LP decoding" by Cynthia Dwork, Frank McSherry and Kunal Talwar (Microsoft Research) who suggests that
    ... any privacy mechanism, interactive or non-interactive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.
    Thanks again to Mark Wahl for these references.
  6. Thanks a lot to Dan Brickley for working together with me on this last Friday, and pointing me to many of the important work done here. Dan also wrote a little python script to do something similar. Some of the sites I came across during our discussion: Not having studied bloom filters in detail, I am not sure how compatible the blooms of each of these libraries are. The super simple ruby bloom library does not seem to specify the number of hashes that were used to create a Bloom.
  7. Nick Lothian reminded me in a comment to this that he has written a Bloom Filter demo for facebook. I don't have a facebook account (because I am already on LinkedIn, and I can't really be bothered to move all my information, and because I don't like closed networks), so I was not able to use it. Perhaps I should get a facebook account just for this... Let me know.

Monday Aug 20, 2007

Purple Ocean Strategy

A few weeks ago I read through "Blue Ocean Strategy: How to Create Uncontested Market Space and Make the Competition Irrelevant", By W. Chan Kim and Renée Mauborgne of the Insead business school. This book has sold millions of copies since its publication. It is very easy to read, and contains a lot of clear and entertaining business cases by way of illustration, from the growth of the Cirque du Soleil via the story of the turnaround of crime in New York under the leadership of Bill Braton, all the way to Apple's phenomenal success with the iPod.

The Blue Ocean the book refers to is opposed to the Red Ocean of competition in well established markets where optimization and distinction on well understood, standardized criteria matter. The Blue Ocean stands for the new markets created by businesses where there are no predefined standards, no predefined audience, where no industrial feet have yet been placed; in short the sought after space where there is no competition, where huge fortunes can be made. One of the very nice things about this book is how it shows just how much the blue ocean markets can be created in every walk of life, not just where one expects it the most, in technology driven industries.

The aim of the book is to show how these oceans of innovation are created. The tools it develops to make it possible to understand this are very easy to grasp, and make a lot of sense. One point it makes, and that every creator knows, is that you cannot find a Blue Ocean by asking your customers what they want, or by doing simple market studies. Of course these spaces are created by responding to something people really wanted, and feeling for your customers is an important aspect of seeing new possibilities emerge. But the business owner, the entrepreneur - as opposed to the manager ( the book does not make this distinction ) - is the creator of a new value space which cannot be comprehended by the market ahead of time. More so even since by creating something new, the entrepreneur is redefining the boundaries of the established market, and so redefining the audience. The Cirque du Soleil for example changed both the definition of what a circus was and what theater was. In doing that the Cirque du Soleil became a competitor of not just theater and circuses, but also other night time activities people might have enjoyed in their place. The Cirque du Soleil did that but seemed also to appear out of nowhere.

Blue is the symbol of Liberty. The French flag is blue, white and red: Liberty, Equality, Fraternity. Blue in Europe is also associated with conservatism. The history of color associations in the USA is more complex and currently has the reverse association in part due to the stigma attached to the color red in the battle against Communism. Just as with the colors the book presents what are probably very complex ideas in an amazingly simple way. It separates the strands of thought the way a crystal separates light. Like a beat of electronic music it drums these distinctions into the readers mind, so that there is probably no need to re-read the book twice: reading it once is to read it three times. So my following criticism or thoughts will probably be just very facile remerging of what was separated for clarity.

Following the internet and computer industries I have noticed an element of the relation between red and blue that this book fails to make. As our CEO Jonathan Schwartz often mentions on his blog, it is not because one is in a commodity market that one cannot make a huge profit. The electric plug in your house, voltage, wire sizes and many other parts of the electricity industry are standardized. Those are commodity markets. Yet companies like General Electric or Siemens that produce huge generators for large dams or other electrical installations are in some very profitable markets. Without the standardization of the plugs and voltages, the electricity industry could never have grown so big. Standardization I have noticed, can be a stepping stone to building a Blue Ocean, the blue can build on the red.

To illustrate let me take one example from the book: Apple's huge success in recent years. One of the conceptual tools put forward by Blue Ocean Strategy, is that one has to create a new value curve. Remove some aspects of cost and value from a product (no animals in the Cirque du Soleil), change other aspects of value (price), create something new (artistic dance show). One way Apple reduced cost was by adopting open standards. By building on the Unix Operating system developed and used in Universities world wide they removed the major research cost of developing an Operating System whilst gained a huge pool of ready and highly qualified experts worldwide, and all the software that had been built in an Open Source way over time. The default compiler of OSX is Gnu CC. Think of the huge cost reductions that flow from being able to build in such a way on the works of others. By adding the one thing that had been missing from that system, an artistically coherent and beautiful end user experience, Apple gained those people's hearts and support and gave them a unique value proposition, bringing a very important community to Apple that would never have touched it before. By building on these open standards Apple also brings value to the community, if only in the existential example that it can be done, but certainly also over time in feeding back the improvements to the community. Simon Phipps explains how this works in full detail in "The Zen of Free". The same forces at work also lead Sun Microsystems down a similar path to its logical conclusion: by Opening up all of the software stack. As a result Sun and Apple are able to cooperate in numerous ways that would otherwise have been impossible. By working on a standard base Apple can gain award winning technologies such as ZFS at very low cost, allowing it to focus on differentiating itself where its user base's value is: simple packaging, beauty and fluid end user experience. Apple's switch last year to Intel is a similar move, building this time on an industrial de facto standard.
In all these cases reducing costs is not removing something completely from the system as proposed by Blue Ocean (removing the lions), but building on the commoditization and standardization of one layer, thereby bringing the costs down to close to zero. Building on the Red Ocean of community ownership a Blue Ocean of innovation and creativity, in a way that respects the value of the Red Ocean, is what I would like to call here, on my little blog at the end of the universe, Purple Ocean Strategy.

Having gotten this far it may be necessary to enlarge the notion of what is Red all the way to Green. If Red is what is socially established, fraternal ownership, then further along there is what is common to all living things, the biosphere, the Green. A strategy that took this into account would be looking for how to use and build in a sustainable way on that space. It is clear that not taking this into account can be extremely damaging, as the unfolding drama of Beijing Olympics is revealing. How to take it into account effectlively, may get us to the Turquoise Ocean Strategy.

Wednesday Mar 21, 2007

James Gosling on Web N

James Gosling had a couple of slides on Web N during his presentation on the Java Platform. Is it "a piece of Jargon" as Tim Berner's Lee is quoted as saying? Well James seems to agree in part with that assessment. It is a lot of hype for what seems to be a very simple thing: just different User Interfaces on ways of storing data on servers. The one consistent similarity of these services, he points out in the next slide, is the way they build communities, using the input of millions to create services that no single organization could have provided.

But in that respect, how does that differ from projects such as Linux, which I was using as my desktop OS in the 90ies? That was a huge piece of engineering developed on the internet, using the web and other tools, in a communal fashion. How does that differ from services such as imdb, the largest online database of films, which I was happily using ten years ago, whose whole content was updated by its users? Is it that the improvements in the web interface are making it easier and easier for people to contribute content? Partly so. If adding photos to a flickr account forced one to fetch a new page for every change, it would be a lot less appealing. But how much then does bandwidth improvements have to do with this? Services such as flickr would have been unbearable in the early web. Certainly YouTube would have gotten nowhere, not even taking into account the difficulty of editing videos on 400Mhz machines. So is Web 2.0 a technical thing, or is it something else?

I'll agree that Web 2.0 is a social phenomenon, in more ways than one. It is a meme that also has a psychological dimension. People who thought that by 2000 they had understood all about the web, the .com aspect, never quite grokking the huge open source wave, those people then declared the Web bubble burst. As more and more amazing things continued happening after the .com bust, they need a way to change their tune without feeling that they had gotten something wrong. Hence Web 2.0. The web just keeps evolving. It's always more than you thought it could be.

Another thought is that if we can trace Web 2.0 all the way back to Open Source programming, then my feeling is that this is where one should look to sow the seeds for Web 3.0. The Open Source community is full of small little Island projects. True they can all exchange code between each other, but the interaction between the groups could be a lot better, just as the interaction between Web 2.0 sites could be. If one could make the interactions between these communities a lot more fluid, then one will certainly be able to unleash a whole new wave of energy. This is why I am so enthusiastic about Baetle, the bug ontology we are developing, which should be an important element in helping open source project work together.

The next generation of the Web is not going to be obvious: how could it be? If it were obvious it would, technical issues aside, already be here. The people most apt to be able to move those technical issues aside, are of course going to be developers themselves. As they see the benefits, these will be distilled into something useful and easy to understand for everyone else.

Search

Recent Entries

Navigation

Referers