The Sun BabelFish Blog
Don't panic !
James Gosling has a foaf name
And so does Tim Bray, Greg Papadopoulos, Jonathan Schwartz, Sun Microsystems, and Java. All thanks to the great work of the DBPedia people, a loose network of highly skilled distributed self selected avant garde force de frappe, who are extracting all the metadata possible from Wikipedia and making it available as hyperdata, ready to be linked to. :-)
You can browse their information on the web, or with the Tabulator generic data browser which will merge information it finds into one large graph as you explore it. As a result of this I can now add Tim Bray and James Gosling to my foaf file (
), by adding the following N3 statements:
:me foaf:knows [ = <http://dbpedia.org/resource/James_Gosling>;
a foaf:Person;
foaf:name "James Gosling" ],
[ = <http://dbpedia.org/resource/Tim_Bray>;
a foaf:Person;
foaf:name "Tim Bray" ] .
It is worth looking at how DBPedia works. http://dbpedia.org/resource/James_Gosling is now a Universal Resource Identifier for James Gosling. You cannot fetch James because he is not an information resource, ie, he is not a document, though he is very resourceful, and full of interesting information. You can tell that James is not an information resource because you can't copy him easily. So when you do an HTTP GET on that URI you get the following:
hjs@bblfish:0$ curl -I http://dbpedia.org/resource/James_Gosling HTTP/1.1 303 See Other Date: Sat, 15 Dec 2007 17:57:54 GMT Server: Apache-Coyote/1.1 Vary: Accept,User-Agent Location: http://dbpedia.org/page/James_Gosling Content-Type: text/plain Content-Length: 90
ie you get a redirect to the page about James Gosling. This is because curl by default asks for the html representation of resources. Had you sepecified that you wanted the machine readable rdf/xml representation you would get a redirect to another resource:
hjs@bblfish:0$ curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/James_Gosling HTTP/1.1 303 See Other Date: Sat, 15 Dec 2007 18:01:10 GMT Server: Apache-Coyote/1.1 Vary: Accept,User-Agent Location: http://dbpedia.openlinksw.com:8890/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FJames_Gosling%3E Content-Type: text/plain Content-Length: 210
Here you get a redirect to a SPARQL query to DESCRIBE James Gosling. To get the full content, in N3 try:
hjs@bblfish:0$ curl -L -H "Accept: text/rdf+n3" http://dbpedia.org/resource/James_Gosling
the -L flag follows all the redirects...
Posted at 06:46PM Dec 15, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[7]
Life is Champagne
It's full of bubbles
Thanks to Matt Hempey and The Richter Scales
Posted at 12:24PM Dec 05, 2007 [permalink/trackback] by Henry Story in Silly | Comments[0]
Twine: Organising *your* information
Nova Spivack's company Radar Networks today unveiled at the Web2.0 summit in SF, the new service Twine. Nova Spivack has no trouble pronouncing the phrase "Semantic Web", and has built this whole service on those technologies. He describes this in a detailed podcast: Twine: A social network built on the semantic web. One quote I liked: "Whereas Google's mission is to organize the world's information, Twine's mission is to organize your information."
This looks very interesting. Nova Spivack and Lew Tucker presented the Semantic Web Birds Of a Feather at this years Java One. This is the service they were speaking about.
More on the web:
- Wired: Radar Networks To Unveil Its Semantic Web App, Twine
- Nodalities: Web 2.0 Summit - tying it all together with Twine
- Danny Ayers: Radar Networks decloak: Twine
- O'Reilly: Web2Summit: Radar Networks Unveils twine.com
Posted at 07:11PM Oct 19, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[0]
Why Web 3.0?
As Tim O'Reilly admitted recently, the Web 2.0 meme was created to help businesses get over the dotcom crash. There was no way of getting investors to put money in the web, so it was important to rebrand. Mike Bergman - and many others - may not like it, and quite reasonably so, but this was probably a business necessity. The web of course never died and clearly never will. But since "Web" got associated with bust, Crash, 911 and what not, it was important to emphasize that everything did not end, there was a new beginning. There was a life after Pet Shop stores. This is the point of Web 2.0, and Tim O'Reilly did a great job with his article "What is Web 2.0?" in emphasizing this evolution.
The rebranding was extremely successful. But with success often comes conceit, and it became obvious that major evolutions were being left out of the Web 2.0 sphere. And, as this recent article by Tim indicates, "Today's Web 3.0 Nonsense Blogstorm", the key proponents of 2.0 do not feel like allowing those technologies in, either because they don't understand them, or because they have enough on their plate, or because they find it difficult to speak about it to their investors, or a combination of all of those. It is difficult to explain since as I showed in a recent article that Semantic Web technologies very nicely complement O'Reilly's Web 2.0 patterns. Whatever the reasons for this rejection, it is clear that there is an after Web 2.0 building up, and so the best way to name it is Web 3.0. For some reason this after seems to be unpleasant to the 2.0 folks. Of course since it probably limits the capital they have access too. Competition does that. But it is a limit that they are imposing on themselves. Was it because it was easier for them to build momentum for their ideas? Starting small is a good strategy. But no one can own the whole future. It evolves, and idea that Nova Spivack defends very clearly, and for which he is rewarded by having some clever investors.
In fact we should be glad there is the Web 2.0 crowd and that Tim manages to argue so well at keeping them there, and frightening them from coming over here. Without this boundary it would be much more difficult to explain what is new, and we would end up being overwhelmed by a me too crowd intent on latching onto the latest. (There are a few of those already here, btw.) So yes. Web 3.0 is the future, but it is a risky one. On the other hand as the Web 2.0 space fills up, life will be getting more and more difficult in the red ocean of intense competition, witness the never ending new social networking startups. Inevitably the risks of going three are going to be outweighed by the difficulty of staying in me 2 land.
But if all of this still makes the hair rise up on your head, I suggest using the web n+1 shorthand. That puts you at the bleeding edge always, in a politically correct way. And for the whole thing explained with a lot more humour, see Web 3.0 I$ About Money.
Posted at 01:17PM Oct 07, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[4]
hyperdata
I just came across a recent post by Nova Spivack, "The Semantic Web, Collective Intelligence and Hyperdata", where he defines a couple of very useful words: hyperdata and folktologies. The one I'd like to look at here is the very important concept of hyperdata:
One might respond [...] by noting that there is already a lot of data on the Web, in XML and other formats -- how is the Semantic Web different from that? What is the difference between "Data on the Web" and the idea of "The Data Web?" The best answer to this question that I have heard was something that Dean Allemang said at a recent Semantic Web SIG in Palo Alto. Dean said, "Sure there is data on the Web, but it's not actually a web of data." The difference is that in the Semantic Web paradigm, the data can be linked to other data in other places, it's a web of data, not just data on the Web. I call this concept of interconnected data, "Hyperdata." It does for data what hypertext did for text. I'm probably not the originator of this term, but I think it is a very useful term and analogy for explaining the value of the Semantic Web.
Nova's Hyperdata article was written in response to Tim O'Reilly's recent post Economist Confused about the Semantic Web. Tim correctly points out that the word Semantic is often used to cover technologies that are closer to Web2.0, data silo technologies. But to get out of the data silos, we need hyperdata which is, like the web, and contra Tim, a social/folk/community enterprise.
The original Economist article was published on August 28 2007: The web: some antics
For a very good tutorial introduction to hyperdata see How to Publish Linked Data on the Web by Chris Bizer, Richard Cyganiak, and Tom Heath .
Posted at 01:37PM Sep 20, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[1]
Intel® Mash Maker: Mashups for the Masses
Intel® Mash Maker is a new service that make it easy to create screen scrapers to extract data from normal web pages and create mashups. The tag lines on it's front page are:
- "Browse don't program": the mashmaker will suggest mashups as you browse the web
- "View the internet not just a web page": it combines many web pages into one view
- "Enter the semantic Web via the back door": draws on the wisdom of the community to understand the structure and semantics of information on the web
It comes with a Firefox plugin with which one can create screen scrapers using XPath Queries, to extract data which one can then save on their server - and which I think then belongs to Intel® . There are a number of video's that show how this works on their site.
This is clearly very similar to the Piggy Bank Firefox plugin from MIT. Is the novely here that Intel is hosting the mashups, and perhaps even cleaning them up? Or that everything then belongs to them? [ not so: the lisence on the contributions is very light, though without an obligation to share ]
Notes
Thanks to Josef Holy for pointing me to this, enthusiastic at the visible spread of the Semantic Web meme.
Danny Ayers, has looked a little further into this: There is a Front Door
Posted at 03:36PM Sep 18, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[8]
TripleI: Web 2.0 meets Web 3.0
A week ago I was in Graz, Austria, for a conference called TripleI: iKnow, iMedia, iSemantics, bringing together researchers from the fields of cognitive science, media studies and semantic web technologies. There were some very interesting papers given there, which I shall speak of in due course. First a (very little) ego booster: I had my first paper presented here, written it is true together with Andreas Blumauer and Peter Reiser who also turned up as a Keynote speaker, with some excellent slides (available from his blog).
The core of the paper, entitled "Towards an 'Enterprise n+1' " has an interesting argument developing Tim O'Reilly's Web 2.0 design patterns. Here is the relevant extract:
The Long Tail
In “The Long Tail” Andersen pointed out that information technology is turning mass markets into a million niches. As the incremental costs of making goods available are lowered, companies can offer massive variety in their catalogue instead of the one size fits all blockbusters. This effect is even more true for knowledge, which by its essence is diverse and which benefits most by the information revolution. As more and more people participate in the knowledge process, making more information than ever available, as we move from an economy of scarcity to an economy of abundance, the problem also shifts from one of finding information at all to being completely overwhelmed by it. The key to increased information productivity is therefore to improve the match making process. Now the Long Tail in an organization is not so much the documents that encode the knowledge as the people who know. By making it easy to share information, you not only expose the hidden knowledge, you also expose the knowers themselves. The benefit comes from seeing who knows what, in being able to engage them in more effective roles.
Just as companies try to reach out to the entire web, to the edges and not just the center, to the long tail and not just the head, so must more decentralization in KM System be strived for. These must work with the decentralised nature of knowledge, linking individuals to individuals, picking up data wherever it exists, and linking it together. The universal linking nature of the semantic web infrastructure, with its ability to relate globally dispersed resources, clearly points to a very central role for this technology.
Data is the Next Intel Inside
Tim O’Reilly states that “Every significant internet application to date has been backed by a specialized database”. Indeed, enterprises are not only facing an unforeseen growth of complexity of data, content and knowledge, they are also challenged by the always increasing need for integrating data. Classification and semantic annotation (by a combination of user-driven, expert-driven and automatic measures) of all the information in an enterprise is the key for a successful implementation. While the use of RDF is not usually part of the Web 2.0 story, it is clear that this plays especially well to its strengths. Being designed for data integration world wide, with the use of the well known URI to create a global information space, RDF is perfectly suited to be the new Intel Inside for the global distributed corporate database.
Users add Value
„Involve your users both implicitly and explicitly in adding value to your application.” From a KM-perspective this means, that any metadata users add to a knowledge object (by tagging, rating, commenting, and even clicking on things) the more precisely important aspects of any asset can be calculated and evaluated. Amazon.com for example, ranks products by computing the "most popular" ones not only on sales but also factors some call the "flow" around products. In order to describe these actions or preferences, which take place on the WWW of information resources identified by URLs; one needs to describe them as actions on, preferences for, tags on resources identified by URLs. RDF, the Resource Description Framework, which uses the Universal Resource Identifiers as its corner stone, clearly serves as the enabler for the above mentioned Web 2.0 design pattern.
Network Effects by Default
The demand to “set inclusive defaults for aggregating user data as a side-effect of their use of the application” is one of the most obvious options how to transfer this design pattern straight into a Knowledge Management System: For example, each user tag improves the tag recommender system of a KM system which eventually helps to get better search results. Network effects can take many shapes or forms. Tags for example can also be disambiguated by linking them to a wiki, and allowing the wiki page owners and users to vote on their precise meaning. “The service automatically gets better the more people use it”, but the service needs not be a single service. Interacting services (such as a wiki and a tagging engine) can use the distributed knowledge of the enterprise in completely unsuspected ways.
Some Rights Reserved
For the most efficient data-sharing this design pattern demands to “follow existing standards, and use licenses with as few restrictions as possible”. Looking at today’s intranet solutions, content management systems or other systems where users usually generate or distribute content, it becomes obvious, that a strong culture of ownership is making it a lot more difficult to merge information than it should be. Although more flexible ways to define IPRs exist (like creative commons), this lesson has not been integrated in current KM systems. Data cannot flow if it siphoned off behind legal barriers. One structural way to help align corporate interests with individual ones would be to make the cost of secrecy in an enterprise apparent.
The Perpetual Beta
The statement “One of the defining characteristics of internet era software is that it is delivered as a service, not as a product” reflects best what is meant by “perpetual beta”. When applying this design pattern, which is strongly linked to open source development practices, in a Knowledge Management System, we can support our knowledge intensive processes in a much more flexible way. Instead of storing all knowledge in a centralised database, we should provide smart services which are constantly developed on top of insights gained from monitoring user behaviour together with other users acting as co-developers.
Cooperate, Don't Control
This design pattern has been discussed for years throughout the KM community. Neither Web 2.0 nor Knowledge Management is a technological revolution: “The transformations the Web is subject to are not driven by new technologies but by a fundamental mind shift that encourages individuals to take part in developing new structures and content.” [Kolbitsch, 06] The question in the context of Knowledge Management is: How can we stimulate this mind shift? In our first use case we will consider if measuring the value of user contributions could be an answer.
Software Above the Level of a Single Device
The idea of the Web as a platform – “What applications become possible when our phones and our cars are not consuming data but reporting it?” makes technologies which support support semantic interoperability on top of metadata standards even more necessary. From a KM-perspective this means that knowledge generation and annotation must happen on top of standard formats like RDF. From a technical perspective Tim Berner ́s Lee ́s proposal of an RDF bus deploying RDF mapping tools like D2R [Bizer, 03] seems to be an applicable solution which is already at hand.
Notice how Peter Reiser cleverly placed the "Web n+1" word in this paper, allowing him now to claim the wikipedia page for it :-) The paper then goes on to develop Peter's idea of Community Equity, a way of thinking of the link between participation and visibility inside a company, and a providing a framework for creating an architecture to help give people incentives to cooperate.
Posted at 02:48PM Sep 17, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[1]
My Bloomin' Friends
Closed Social Networks are blossoming all over the place. They provide a semblance of protection, at a price: lock in. Locked into the social network provider you get convenience in the form of tools to make conversation easier (video, email, chat boards, ...), some form of privacy protection (if you trust the provider), introductions to 'like minded' people, and other niceties.
Some of us work in the open air: we have to set standards in public view; we stand by what we say; we accept criticism from wherever it comes; and we can't choose our friends based on their social network provider. We describe ourselves in our foaf files where we can specify what we do, how to contact us, our interests, and links to who we know by pointing to their Universal Identifiers. There is no trouble linking between people who are open in this way. We are happy to reference each other: it strenghtens the exposure of our work and the quality of the web. This is how I link to Paul Gearon:
:me foaf:knows [ = <http://web.mac.com/thegearons/people/PaulGearon/foaf.rdf#me>;
a foaf:Person;
foaf:name "Paul Gearon" ] .
I could just point to his URL, but the little extra duplicate information can make life easier for people/robots browsing the data web. It can help people notice inconcistencies and help me correct them.
But not everyone lives in the open the same way, and not everyone wants to make the same amount of information about themselves public. There are a number of different ways to deal with this. I want to discuss a few of them here.
Content Negotition
How much someone says about themselves is up to them, and so is how they protect their information. The same URL that identifies someone, could return more or less information depending on who is asking. I could set up my foaf file so only friends who log in via openid can see my friends. Others would just get default information about me. I could be even more clever. I could allow any friend of my friend who logs in via their openid to see my full foaf file; others would see information about me, and a select group of open friends. Closed Social Networks could open up by making it convenient to specify these policies, and providing the right infrastructure to do so.Indirect Identification
By directly identifying someone via a URL (as I do) we can leave a lot of the policy of what they make visible up to them. But those that don't have a foaf name, need to be identified indirectly. We can do that by identifying them via some property such as their blog, their home page, their email address, or their openid. I am very open about my email addresses. They are published and visible to all.<http://bblfish.net/people/henry/card#me> <http://xmlns.com/foaf/0.1/mbox> <mailto:henry.story@bblfish.net> .I value it more that people can contact me easily - living as I do in the middle of nowhere and often living nowhere in particular - than the pain of spammers. Too many people are lazy about security, using virus filled Windoze computers, obvious passwords, cracked software for me to be under any illusion that hiding my email is going to prevent the bad guys from getting it.
However I can't assume that everyone else will accept me applying this argument to their email address. For this there is a nice mathematical technique: I can encrypt their email address using the SHA1 hash function. This create a close to unique string that cannot be dissasembled. You cannot go from the sha1 sum of an email address back to the email. But you can always calculate the same sha1sum from an email. This is how I identify Simon Phipps, Sun's Open source Officer:
:me foaf:knows [ a foaf:Person;
foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
foaf:name "Simon Phipps";
rdfs:seeAlso <http://www.webmink.net/foaf.rdf>
].
If you know Simon's email, then you will know that I know him. "What use is that?" I can hear someone ask. It's all about Working with People on the Internet. Imagine you are reading email on a newsgroup with a foaf enabled mail tool linked to a foaf enabled Address Book (such as Beatnik). You come on an email by Simon saying something interesting about how Sun has changed its stock ticker to JAVA for example. My logo and perhaps that of a couple of other people appears on the mail reader in a way that indicates to you that we know Simon. The post is no longer anonymous for you, and so has more trust value. You feel part of a community.[1]
So spammers can not use that information to spam. Either they already know your email address, and so they are probably already spamming you, or they don't, and this won't help them. They can only [2] learn about social network claims: who claims to know who. They could use this, it is true to introduce themselves as an aquaintance of a friend of yours. A bit of a risky strategy that could quickly get them on a black list. Currently being black listed may not be an expensive proposition. But in a cryptographic web of trust this will be both much easier to notice, and more damaging for the infringers.
Fuzzy Identification
I can directly and indirectly identify a lot of people in my Address Book as described above. This is perfectly acceptable for people who have an open life, like I do, and a large portion of the Open Source community, bloggers, standard setters, etc... But on last count I had over 700 people in my AddressBook. It is a lot of work to identify all of them individuall, and to decide how much visibility I should give them. I may not even want people to know how many people I know this way. Also I may want deniability: there are people one may know, but one may not want to highlight that, and one may want to be able to deny that one knows them to some people. The foaf:sha1sum gives me a way to identify someone, but if some nozy person comes to me and asks me about that person's life after having identified the corresponding email address, there is no escape route other than refusing any conversation, which by itself can easily be taken to be significant. What we need is a way to fuzily identify a group of one's aquaintance.
Bloom Filters
This is what Bloom Filters enable one to do. Originally used in times when memory was expensive, they allowed the whole vocabulary of a language to be condensed into a reasonably short string. Here we can use it to group all the email addresses of our friends together in one opaque string. I could express as follows in RDF (bear in mind that the rdf vocabulary has not been settled on):
:me foaf:bloomMbox [ a bloom:Bloom;
bloom:base64String """"
IAOgQgSAAAICCAADAoQgDABAAiQKgIABgyAIBEhAAAAIUKBACCYAABAAaEkGQAGIEAHRUAgAAQUw
hCgwACJNQxQAAggAgCIgAAAAKgICEKAAAABCQiB0JCAAAIkgDASAYiAAAEIQAAIAABDCEAZACOpA
ICEEMAGAEGEAxIA=""";
bloom:hashNum 4;
bloom:length 1000 ] .
Given the above Bloom someone can query it with an email address using the inverse algorithm and the Bloom will answer either that I may know that person, or that it can't tell. The loaf project explains some of the advantages of having this in more detail.
The best way to get a feel for how it works is to try it. Here I have written a little java applet [3] that allows you to test my Bloom for people I know, and to create your own bloom [4].
Some emails you can try with positive results are tbray attextuality dot C O M, or bill at dehora dot net (suitably transformed of course). The applet lowercases all email addresses when creating and when testing the bloom.
To create your own bloom just click the "Create Bloom" tab. An easy way to extract all your email addresses from an OSX Address Book is to run the following on the command line:
hjs@bblfish:0$ osascript -e 'tell application "Address Book" to get the value of every email of every person' | perl -pe 's/,+ /\n/g' | sort | uniq | pbcopy
You should now be able to paste the list of all your contacts in the applet. To restrict the Addresses to on of your groups named "foaf" for example replace the relevant section above with tell application "Address Book" to get the value of every email of every person in foaf.
You will need to choose the number of hashes and the maximal size of the bucket you wish to fill. The greater the number of hashes and the greater the size of the bucket, the more precision you get and the less deniability.[5]
Conclusion
None of the above tools are by themselves the complete solution for creating an Open Social Network that will satisfy everyone. But for people willing to live in the open, the correct and astute use of them should satisfy most of people's requirements. Access Control on URLs can make it possible to reveal more or less information depending on who is looking; indirect identification can allow one to name people even without direct identification; sha1sums allows one to partially hide sensitve identifying information; and Blooms allow one to make fuzzy statements of set membership. All of these can be combined in different ways. So one can make statements about sha1sum identified people on the open web, or one can do so behind an access controlled file that only friends logged in with OpenId can see. There are bound to be more fun things to be discovered here. But this should make clear just how much can be done in this space.
Notes
- For the link from email addresses to sha1sums to work, it helps to canonicalise the emails to all lowercase. This should probably be made more explict in the foaf:mbox_sha1sum definition.
- "They can 'only' learn about social network claims", is quite a lot more than some people are willing to accept. See the article by Mark Wahl "Organizing principles for identity systems: Attacks on anonymized social networks and fudging oracles" which contains some very good pointers. For people who want to retain complete anonymity, and this is what people subscribe to when they answer public surveys, any leakage of information is too much leakage. The problem is that because of Metcalf's Law it is nearly impossible to stop information combining itself: Information wants to be linked. So I think, when we are not tied to stringent laws, we should accept this rather than fight it, and use it to our advantage when hunting down spammers: the law holds for them too.
- You can get the source code for the applet on the so(m)mer repository in the misc/Bloom subdirectory. I used the pt.tumba.spell.BloomFilter class which I adapted a little for my needs. This was just the first one I found out there. It is probably not the most efficient one, as it uses an array of booleans, when it could use an byte array. If you know of other libraries please let me know.
The code was put together really quickly and may well contain bugs. Feedback and patches and contributions are welcome. - the advantage of Java Applets over server side code is really obvious here:
- I don't need a server with a fixed port number to show you this
- someone can't easily start a denial of service attack to bring the server down
- You email addresses never leave your computer, so there is no fear of loss of privacy.
- More info on the load site. Apparently one needs more than 1/4 deniability if one is to preserve some measure of privacy, according to the paper
"the price of privacy and the limits of LP decoding" by Cynthia Dwork, Frank McSherry and Kunal Talwar (Microsoft Research) who suggests that
... any privacy mechanism, interactive or non-interactive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.
Thanks again to Mark Wahl for these references. - Thanks a lot to Dan Brickley for working together with me on this last Friday, and pointing me to many of the important work done here. Dan also wrote a little python script to do something similar. Some of the sites I came across during our discussion:
- A lisp blog on Bloom Filters
- Bloomin simple Ruby library
- loaf perl downloads
- pt.tumba.spell.BloomFilter original JavaDoc. This explains things nicely. I adapted this class a little.
- Foaf based White listing: how to use foaf to reduce junk mail. Even better, use Bloom Filters!
- Nick Lothian reminded me in a comment to this that he has written a Bloom Filter demo for facebook. I don't have a facebook account (because I am already on LinkedIn, and I can't really be bothered to move all my information, and because I don't like closed networks), so I was not able to use it. Perhaps I should get a facebook account just for this... Let me know.
Posted at 05:48PM Aug 28, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[9]
Open Data: Information wants to be linked
With over 2 billion relations from the great web community data projects such as Wikipedia, Project Gutenberg, Music Brainz, and many more... the Linking Open Data initiative is tying together a vast pool of quality machine readable information on which one can run any of the over 500 Semantic Web tools. As the value of linked information increases much faster than the networks described by Metcalf's Law, the value of this must be tremendous.
By creating data browsing interfaces such as Tabulator, one has a very simple RESTful, Resource Oriented Architecture API to work with. With various SPARQL endpoints available or to be built, one can treat that information like a hugely powerful database.
Forget Web APIs: long live linked data!
Some of the projects listed are:
- DBPedia: wikipedia as browseable relations. Excellent!
- DBTune: extracts data from Magnatune, Jamendo, Dogmazic, Mutopia, and to create links between them and other available semantic web repositories like DBPedia
- Freshmeat exports metadata about its 43 thousand projects
- CIA factbook
- RPM Find as RDF! for linux users
- The US Census as RDF!
Posted at 11:11AM Aug 17, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[0]
A Foaf file for Sun!
Sun Microsystems has recently given all its employees an OpenId that is guaranteed to identify each person at Sun. This has allowed me to add the following to my foaf file:
:me foaf:openid <http://openid.sun.com/bblfish> .
Now it would be nice if Sun could make the statement that all of its employees have such ids in a machine readable way. This could then be used by other organisations, say the W3C of which Sun is a member, to identify all of Sun's employees, and so give them access to member only parts of the W3C web site. But with OpenId as it currently stands this is usually thought to be impossible. For at its core OpenId just allows a client service to verify that an EndUser has its identity confirmed by a certain service, which the end user points you to. There is no way to specify what the service is, who it is related to, who owns the id, etc...
Well OpenId does not provide for this out of the box, but it is not difficult to imagine how one could do this. The first thought that comes to mind is to have Sun Microsystems publish a foaf file (for Sun) that listed all its members using the new foaf:openid inverse functional property. I am imagining something like this:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://sun.com/sunw#> .
<> a foaf:PersonalProfileDocument;
foaf:primaryTopic :sunw.
:sunw a foaf:Organization, foaf:Group;
foaf:name "Sun Microsystems"@en;
foaf:homepage <http://www.sun.com/>;
foaf:member [ foaf:openid <http://openid.sun.com/bblfish> ],
[ foaf:openid <http://openid.sun.com/jag> ];
....
So Sun would just have to point the W3C to <http://sun.com/sunw> and it could find all the Sun employees OpenIds and give them special priviledges on the W3C web sites. By regularly polling that file, the W3C could keep up to date with its list.
But the problem with the above solution is that it is releasing perhaps more information than necessary. After all each of those openids could be linked to a foaf file, as I explained recently, so revealing a lot of information about the employees at Sun. It would also require regular polling to be kept up to date, and so would be leaky. That is it might not work right after a employee has created his brand new OpenId, thereby leading to some tricky to report bug reports, bad feelings, etc... It may also end up being a very long files - quite long for companies the size of Sun, a lot longer for companies the size of IBM, too long for the Indian Railways (which has over a million employees) and certainly not imaginable for countries such as the USA were it to want to list all its citizens.
What is really needed is a service that can verify the belonging of an id to a group. Wait! That is what OpenId 1.1 provides! The OpenId Server URL names a resource that does two things:
- It can veryify OpenId URLs as being ones that are part of the group it can identify
- It can identifies User Agents as being ones that knows a secret tied to that OpenId (owns it).
So to take the Sun example, all that is needed is to specify that https://openid.sun.com/openid/service is an openid group identifier, and that all IDs that can be identified via that service are identifiers for members of that group.
So let us create such a relation now, and place it in some temporary openid namespace:
@prefix openid: <http://openid.org/tmp/ont#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
openid:memberIdService a owl:ObjectProperty;
rdfs:label "openid member identification service";
rdfs:comment """Any agent that can identify with an openid ID to this service is the agent who
is the subject of the foaf:openid relation to that ID, and that agent is a member of this group."""@en;
rdfs:domain foaf:Group;
rdfs:range openid:IDAuthService .
openid:IDAuthService a owl:Class;
rdfs:label "OpenID Authentication Service";
rdfs:comment "Members of this class are resources that can authenticate agents who present an OpenID."@en .
This would allow us then to write our information about Sun Microsystems like this
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://sun.com/sunw#> .
@prefix openid: <http://openid.org/tmp/ont#> .
<> a foaf:PersonalProfileDocument;
foaf:primaryTopic :sunw.
:sunw a foaf:Organization, foaf:Group;
foaf:name "Sun Microsystems"@en;
foaf:homepage <http://www.sun.com/>;
openid:memberIdService <https://openid.sun.com/openid/service>.
So now when Sun wishes to become a member of a prestigious organisation like the W3C, all we need to do is send them Sun's foaf file URL. This will give them our openid:memberIdService which they can use to identify all of our members. That way they or any other service can tell who our employees are without us ever giving them a list.
Let's look at this the other way around. A web service such as DZone asks me to identify myself and I give them my OpenId http://openid.sun.com/bblfish. That OpenId may have links to a number of different OpenId Servers. Which one should DZone use? Well it may recognise one of them, and just use that. But would it not be nice if the OpenId services could say something about themselves? One very useful thing they could say is what group they identified. This could be done in a nice RESTful way by simply asking for an RDF representation of the service for which we could get the easier to read N3 representation like this:
hjs@bblfish$ cwm https://openid.sun.com/openid/service @prefix openid: <http://openid.org/tmp/ont#> . <> a openid:IDAuthService; openid:serviceFor <http://sun.com/sunw#sunw> .So this would allow a service to follow its nose from openids to the groups they belong to, and assess the trust it has in those groups. The serviceFor relation above could simply be defined as
openid:serviceFor owl:inverseOf openid:memberIdService .
Now you may ask: How does anyone know to trust Sun's foaf file or the Sun OpenId memberIdService? Here we can work a network of trust model as described by David Weitzner in "Whose name is it anyway". To illustrate this imagine the following: If the W3C's foaf file lists its member organisations, by pointing to each of their foaf files, and if the NASDAQ lists its member companies that way using the same foaf file, and Sun itself points back to both of them, then that would be a way of having a distributed reinforcement of the confidence one can have in OpenId servers. After all, if one trusts NASDAQ and the W3C's foaf file, then one should be able to trust that they point to the Sun foaf file correctly. A company listing its members or related organisations is a bit like a person linking to its friends. This is what creates a network of trust.
Posted at 06:05PM Jul 25, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[0]
foaf and openid
My Sun OpenId is helping me use many services I would not have used before. For example I have started using DZone which is a service like DIGG in that it allows one to vote for interesting stories on the web. But unlike DIGG, I don't have to go through the rigmarole of setting up a new account, waiting for an email, replying to the email, remembering one more password which I have to look up in my keychain anyway, etc, etc...
From my short experience I have identified some simple ways one can improve the user experience. Currently for example all the server knows about me is my openId URL. That makes for an impersonal experience, as you can see from this comment I posted:

Luckily there is an obvious and easy fix to this. My openid http://openid.sun.com/bblfish should not just return a representation that contains a link to the openid server
<link rel="openid.server" href="https://openid.sun.com/openid/service" />but also a link to a representation that contains more information about me, which would be my foaf file. This could be done very simply by growing the header of my openid html by one line, as specified by the foaf FAQ:
<link rel="openid.server" href="https://openid.sun.com/openid/service" /> <link rel="meta" type="application/rdf+xml" title="FOAF" href="http://bblfish.net/people/henry/card"/>which is what videntity.org has been doing since 2005 [1], and openid.org has been providing since early July [2]. Now all that would be needed then is for dzone to read the foaf file pointed to, and extract the name relation, email and logo from the person described in the foaf file with the same openid. This could be done with a simple SPARQL query such as
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?mbox ?logo ?nick
WHERE {
?p foaf:openid <http://openid.sun.com/bblfish>.
OPTIONAL { ?p foaf:mbox ?mbox } .
OPTIONAL { ?p foaf:logo ?logo } .
OPTIONAL { ?p foaf:nick ?nick } .
}
If you save the above to a file - say openid.sparql - you can run it on the command line using the python cwm script like this:
hjs@bblfish:2$ cwm http://bblfish.net/people/henry/card --sparql=./openid.sparql
# Base was: http://bblfish.net/people/henry/card
@prefix : <http://www.w3.org/2000/10/swap/sparqlCwm#> .
{
"bblfish" :bound "nick" .
</pix/bfish.large.jpg> :bound "logo" .
<mailto:henry.story@bblfish.net> :bound "mbox" .
} a :Result .
{
"bblfish" :bound "nick" .
</pix/bfish.large.jpg> :bound "logo" .
<mailto:henry.story@gmail.com> :bound "mbox" .
} a :Result .
{
"bblfish" :bound "nick" .
</pix/bfish.large.jpg> :bound "logo" .
<mailto:henry.story@sun.com> :bound "mbox" .
} a :Result .
That's how simple it is! [3]
For those who are still trying to keep their info private, one could add some content negotiation mechansim to the serving of the foaf file, such that depending on the authentication level of the requestor (dzone in this case), the server would return more or less information. If dzone could somehow show on requesting my foaf file, that I had authenticated them, and that should not be difficult to do, since I just gave them some credentials, I could give them more information about me. How much information exactly could be decided in the same box that pops up when I have to enter the password for the service... A few extra checkboxes on that form could ask me if I want to allow full, partial or minimal view of my foaf relations. Power users with more time on their hands could even decide on a relation by relation basis.
Notes
- [1]
- Videntity.org works nicely, and can even import all the information nicely from an existing foaf file! I would rather they give me the option to link to my original foaf file, which I am maintaining, rather than create yet another one on their server. Their foaf creates bnode urls, which makes me a little nervous (The only bnode url that makes me smile is Benjamin Nowack's). Also there is a bug in their foaf file, in that they have given me a URL which makes me both a foaf:Person and a foaf:Document. foaf does specify that there is nothing in the intersection of those sets. Does this make me a budhist?
- [2]
- Sadly I have not been able to use that openid.org account to log into anything yet. There seems to be a bug in their windows service. Their foaf file returns nearly no information at present and is incomplete. But the idea is good.
- [3]
- Here cwm returns an N3 representation. SPARQL servers usually can return both a SIMPLE XML and a simple JSON representation. Those working with a programming library, will skip the serialization step end up directly with a collection of solution objects that can be iterated through directly.
Posted at 12:34PM Jul 20, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[7]
Restful semantic web services
Here is my first stab at an outline for what a restful semantic web services would look like.
Let me start with the obvious. Imagine we have an example shopping service, at http://shop.eg/, which sells books. Clearly we would want
URLs for every book that we wish to buy, with RDF representations at the given URL. As I find RDF/XML hard to read and write, I'll show the N3 representations. So to take a concrete example, let us imagine our example shopping service selling the book "RESTful Web Services" at the URL
http://shop.eg/books/isbn/0596529260 . If we do an HTTP GET on that URL we could receive the following representation:
@prefix : <http://books.eg/ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix shop: <http://shopping.eg/ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix currency: <http://bank.eg/currencies#> .
<#theBook> a shop:Book, shop:Product;
dc:title "Restful Web Services"@en ;
dc:creator [ a foaf:Person; foaf:name "Sam Ruby"],
[ a foaf:Person; foaf:name "Leonard Richardson"] ;
dc:contributor [ a foaf:Person; foaf:name "David Heinemeier Hansson" ];
dc:publisher <http://www.oreilly.com/>;
dct:created "08-07-2007T"^^xsd:dateTime;
dc:description """This is the first book that applies the REST design philosophy to real web services. It sets
down the best practices you need to make your design a success, and the techniques you need to turn your
design into working code. You can harness the power of the Web for programmable applications: you just
have to work with the Web instead of against it. This book shows you how."""@en;
shop:price "26.39"^^currency:dollars;
dc:subject </category/computing>, </category/computing/web>, </category/computing/web/rest>,
</category/computing/architecture>,</category/computing/architecture/REST>, </category/computing/howto> .
So we can easily imagine a page like this for every product. These pages can be accessible either by browsing the categories pages, querying a SPARQL endpoint, among many other ways. It should be very easy to generate such representations for a web site. All it requires is to build up an ontology of products - which the shop already has available, if only for the purposes of building inventories - and tie these to the database using a tool such as D2RQ, or a combination of JSR311 and @rdf annotations (see so(m)mer).
Now what is missing is a way to let the browser know what it can do with this product. The simplest possible way of doing this would be to create a specialized relation for that web service to POST some information to a Cart resource, describing the item to be added to the cart. Perhaps something like:
<#theBook> shop:addToCart <http://shop.eg/cart/> .
This relation would just mean that one has to POST the url to the cart, to have it added there. The cart itself may then have a shop:buy relation to some resource, which by convention the user agent would need to send a credit card, expiration date, and other information to.
This means that one would have to define a number of RDF relationships, for every type of action one could do in a shop (and later on the web), and explain the types of messages to be sent to the endpoint, and what their consequences are. This is simple but it does not seem very extensible. What if one wants to buy the hard copy version of the book, or 10 items of the book? The hard copy version of course could have its own URL, and so it may be as simple as placing the buy relation on that page. But is this going to work with PCs where one can add and remove a huge number of pieces. I remember Steve Jobs being proud of the huge number of different configurations one could buy his desktop systems with, well over 100 thousand different configurations I remember. This could make it quite difficult to navigate a store, if one is not careful.
On the current web this is dealt with by using html forms, which can allow the user to choose between a large number of variables, by selecting check boxes, combo boxes, drop down menues and more, and then POST a representation to a collection, and thereby create a new action, such as adding the product to the cart, or buying it. The person browsing the site knows what the action does, because it is usually written out in a natural language, in a way that makes it quite obvious to a human being. The person then does that action because he desires to do so, because he wishes his desires to be fulfilled. Now this may seem very simple, but just consider the innumerable types of actions that we can fulfill using the very simple tool of html forms: we can add things to a virtual cart, buy things, comment about things, search for things, organise a meeting, etc, etc.... So forms can be seen both as shortcuts to navigate to a large number of different resources, and to create new resources (usually best done with POST).
If we want software agents to do such tasks for us we need both to have something like a machine understandable form, and some way of specifying what the action of POSTing the form will have on the world. So we need to find a way to do what the web does in a more clearly specified way, so that even machines, or simple user agents can understand it. Let's look at each one:
- Forms are ways of asking the user to bind results to variables
- the variables can the be used to build something, such as a URL, or a message.
- The form then specifies the type of action to do with the constructed message, such as a GET, POST, PUT, etc...
- The human readable text explains what the result of the action is, and what the meaning of each of the fields are.
Now what semantic technology binds variables to values? Which one asks questions? SPARQL comes immediately to mind. Seeing this and remembering a well known motto of sales people "Satisfy the customer every desire" a very general but conceptually simple solution to this problem occurred to me. It may seem a little weird at first (and perhaps it will continue to seem weird) but I thought it is elegant enough to be used as a starting point. The idea is really simple: the representation returned by the book resource will specify a collection end point to POST RDF too, and it will specify what to POST back by sending a SPARQL query in the representation. It will then be up to the software agent reading the representation to answer the query if he wishes a certain type of action to occur. If he understand the query he will be able to answer, if he does not, there should be no results. He need not do anything with the query at all.
The following is the first thing that occurred to me. The details are less important than the principle of thinking of forms as asking the client a question.
PREFIX shop: <http://shopping.eg/ns#>
PREFIX bdi: <http://intentionality.eg/ns#>
CONSTRUCT {
?mycart a shop:Cart ;
shop:contains [ a shop:LineItem;
shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
shop:quantity ?q ;
] .
} WHERE {
?mycart a shop:Cart ;
shop:for ?me ;
shop:ownedBy <http://shop.eg/>.
GRAPH ?desire {
?mycart shop:contains
[ a shop:LineItem;
shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
shop:quantity ?q ;
] .
}
?desire bdi:of ?me .
?desire bdi:fulfillby "2007-07-30T..."^^xsd:dateTime .
}
So this is saying quite simply: Find out if you want to have your shopping cart filled up with a number of this book. The user agent (the equivalent of the web browser) asks its data store the given SPARQL query. It asks itself whether it desires to add a number of books to its shopping cart, and if it wishes that desire to be fulfulled by a certain time. If the agent does not understand the relations in the query, then the CONSTRUCT clause will return an empty graph. If it does understand it, and the query returns a result, then it is because it wished the action to take place. The constructed graph may be something like:
@prefix shop: <http://shopping.eg/ns#>
<http://shop.eg/cart/bblfish/> a shop:Cart ;
shop:contains [ a shop:LineItem;
shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
shop:quantity 2 ;
] .
This can then be POSTed to the collection end point http://shop.eg/cart/, with the result of adding two instances of the book to the cart. Presumably the cart would return a graph with the above relations in it plus another SPARQL query explaining how to buy the items in the cart.
So the full RDF for the book page would look something like this:
@prefix : <http://books.eg/ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix shop: <http://shopping.eg/ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix currency: <http://bank.eg/currencies#> .
<#theBook> a shop:Book, shop:Product;
dc:title "Restful Web Services"@en ;
dc:creator [ a foaf:Person; foaf:name "Sam Ruby"],
[ a foaf:Person; foaf:name "Leonard Richardson"] ;
dc:contributor [ a foaf:Person; foaf:name "David Heinemeier Hansson" ];
dc:publisher <http://www.oreilly.com/>;
dct:created "08-07-2007T"^^xsd:dateTime;
dc:description """This is the first book that applies the REST design philosophy to real web services. It sets
down the best practices you need to make your design a success, and the techniques you need to turn your
design into working code. You can harness the power of the Web for programmable applications: you just
have to work with the Web instead of against it. This book shows you how."""@en;
shop:price "26.39"^^currency:dollars;
dc:subject </category/computing>, </category/computing/web>, </category/computing/web/rest>,
</category/computing/architecture>,</category/computing/architecture/REST>,</category/computing/howto>;
shop:addToCart [ a Post;
shop:collection <http://shop.eg/cart/>;
shop:query """
PREFIX shop: <http://shopping.eg/ns#>
PREFIX bdi: <http://intentionality.eg/ns#>
CONSTRUCT {
?mycart a shop:Cart ;
shop:contains [ a shop:LineItem;
shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
shop:quantity ?q ;
] .
} WHERE {
?mycart a shop:Cart ;
shop:for ?me ;
shop:ownedBy <http://shop.eg/>.
GRAPH ?desire {
?mycart shop:contains
[ a shop:LineItem;
shop:SKU <http://shop.eg/books/isbn/0596529260#theBook> ;
shop:quantity ?q ;
] .
}
?desire bdi:of ?me .
?desire bdi:fulfillby "2007-07-30T..."^^xsd:dateTime .
}"""
] .
So there are quite a few things to tie up here, but it seems we have the key elements here:
- RESTful web services: we use GET and POST the way they are meant to be used,
- Resource Oriented Architecture: each shoppable item has its resource, that can return a representation
- the well known motto "hypermedia is the engine of application state": each URL is dereferenceable to further representations. Each page of a containing a buyable item describes how one can proceed to the next step to buy the product. In this case the SPARQL query returns a graph to be POSTed to a given url.
- with the clarity of the Semantic framework thrown in too. Ie. We can proove certain things about the statements made, which is very helpful in bringing clarity to a vocabulary. Understanding the consequences of what is said is part and parcel of understanding itself.
Notes
From discussions around the net (on #swig for example) I was made aware of certain problems.
- SPARQL is a little powerful, and it may seem to give too much leverage to the service, who could ask all kinds of questions of the user agent, such as the SPARQL equivalent of "What is your bank account number". Possible answers may be:
- Of course a user agent that does shopping automatically on the web, is going to have to be ready for all kinds of misuses, so whatever is done, this type of problem is going to crop up. Servers also need to protect themselves from probing questions by user agents. So this is something that both sides will need to look at.
- Forms are pretty powerful too. Are forms really so different from queries? They can ask you for a credit card number, your date of birth, the name of your friends, your sexual inclinations, ... What can web formst not ask you?
- SPARQL is a language that does a couple of things going for it: it has a way of binding variables to a message, and it builds on the solid semantic web structure. But there may be other ways of doing the same thing. OWL-S also uses rdf to describe actions, create a way to bing answers to messages, and descibe the preconditions and postconditions of actions. It even uses the proposed standard of Semantic Web Rules Language SWRL. As there seems to be a strong relation between SPARQL and a rule language (one can think of a SPARQL query as a rule), it may be that part of the interest in this solution is simply the same reason SWRL emerged in OWL-S. OWL-S has a binding to SOAP and none to a RESTful web service. As I have a poor grasp of SOAP I find that difficult to understand. Perhaps a binding to a more restful web service such as the one proposed here would make it more amenable to a wider public.
Posted at 03:23PM Jul 03, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[6]
Jazoon
Roy Fielding gave his very well attended keynote presentation today (Tuesday 26) at Jazoon, the new Java developers conference taking place for the first time in Zurich this week. Coming here just to hear Roy talk was worth the whole trip in itself.
This is the first year of Jazoon, and yet the venue was able to attract over 800 developers (I am not sure of the exact number), which bodes well for its future. So to have close to 10% of the attendees (photo) come to Dean Allemang's talk "Semantic Mashups using RDF, RSS and microformats" was a very good surprise. Dean, who is working for TopQuadrant producers of the Eclipse based TopBraid Composer, is not just a very good presenter, but also a very knowledgeable Semantic Web evangelist. He gave Harold Carr (blog) and others a demo (photo) of TopQuadrant, that started up outside the conference room, moved down into the bar at the entrance (photo), and as it kept being interrupted by great side tracks into Philosophy, Jungian psychology (Jung of course worked in Zurich), Semantic Web company adoption, Literature, Mathematics, Religion, sexual politics, and so much more, that the demo only came to a tentative conclusion around 1am in a bar in the center of Zurich discussing the relations between REST and RDF and how this differed from SOAP. (For Dean's impressions of Jazoon, see his "Swiss Java" blog post.)
My talk, "Web 3.0: This is the Semantic Web" will be taking place on Thursday at 11am. I will be going into more technical details, looking at the foundations of the Semantic Web step by step. As a surprise I may even be able to get a slot for Dean to present his TopBraid composer, which is not just a Ontology editor, but also a complete mashup environment.
Time for me to go to sleep!
Posted at 03:22AM Jun 27, 2007 [permalink/trackback] by Henry Story in travel | Comments[3]
RESTful Web Services: the book
RESTful Web Services is a newly published book that should be a great help in giving people an overview of how to build web services that work with the architecture of the Web. The authors of the book are I believe serious RESTafarians. They hang out (virtually) on the yahoo REST discuss newsgroup. So I know ahead of time that they will most likely never fail on the REST side of things. Such a book should therefore be a great help for people desiring to develop web services.
As an aside, I am currently reading it online via Safari Books, which is a really useful service, especially for people like me who are always traveling and don't have space to carry wads of paper around the world. As I have been intimately involved in this area for a while - I read Roy Fielding's thesis in 2004, and it immediately made sense of my intuitions - I am skipping through the book from chapter to chapter as my interests guide me, using the search tool when needed. As this is an important book, I will write up my comments here in a number of posts as I work my way through it.
What of course is missing in Roy's thesis, which is a high level abstract description of an architectural style, are practical examples, which is what this book sets out to provide. The advantage of Roy's level of abstraction is that it permitted him to make some very important points without loosing himself in arbitrary implementation debates. Many implementations can fit his architectural style. That is the power of speaking at the right level of abstraction: it permits one to say something well, in such a way that it can withstand the test of time. Developers of course want to see how an abstract theory applies to their everyday work, and so a cook book such as "RESTful Web Services" is going to appeal to them. The danger is that by stepping closer to implementation details, certain choices are made that turn out to be in fact arbitrary, ill conceived, non optimal or incomplete. The risk is well worth taking if it can help people find their way around more easily in a sea of standards. This is where the rubber hits the road.
Right from the beginning the authors, Sam Ruby and Leonard Richardson coin the phrase "Resource Oriented Architecture".
Why come up with a new term, Resource-Oriented Architecture? Why not just say REST? Well, I do say REST, on the cover of this book, and I hold that everything in the Resource-Oriented Architecture is also RESTful. But REST is not an architecture: it's a set of design criteria. You can say that one architecture meets those criteria better than another, but there is no one "REST architecture."
The emphasis on Resources is I agree with them fundamental. Their chapter 4 does a very good job of showing why. URIs name Resources. URLs in particular name Resources that can return representations in well defined ways. REST stands for "Representation of State Transfer", and the representations transferred are the representations of resources identified by URLs. The whole thing fits like a glove.
Except that where there is a glove, there are two, one for each hand. And they are missing the other glove, so to speak. And the lack is glaringly obvious. Just as important as Roy Fielding's work, just as abstract, and developed by some of the best minds on the web, even in the world, is RDF, which stands for Resource Description Framework. I emphasize the "Resource" in RDF because for someone writing a book on Resource Oriented Architecture, to have only three short mentions of the framework for describing resources standardized by non less that the World Wide Web Consortium is just ... flabbergasting. Ignoring this work is like trying to walk around on one leg. It is possible. But it is difficult. And certainly a big waste of energy, time and money. Of course since what they are proposing is so much better than what may have gone on previously, which seems akin to trying to walk around on a gloveless hand, it may not immediately be obvious what is missing. I shall try to make this clear in the series of notes.
Just as REST is very simple, so is RDF. It is easiest to describe something on the web if you have a URL for it. If you want to say something about it, that it relates to something else for example, or that it has a certain property, you need to specify which property it has. Since a property is a thing, it too is easiest to speak about if it has a URL. So once you have identified the property in the global namespace you want to say what its value is, you need to specify what the value of that property is, which can be a string or another object. That's RDF for you. It's so simple I am able to explain it to people in bars within a minute. Here is an example, which says that my name is Henry:
<http://bblfish.net/people/henry/card#me> <http://xmlns.com/foaf/0.1/name> "Henry Story" .
Click on the URLs and you will GET their meaning. Since resources can return any number of representations, different user agents can get the representation they prefer. For the name relation you will get an html representation back if you are requesting it from a browser. With this system you can describe the world. We know this since it is simply a generalization of the system found in relational databases, where instead of identifying things with table dependent primary keys, we identify them with URIs.
So RDF, just as REST, is at its base very easy to understand and furthermore the two are complementary. Even though REST is simple, it nevertheless needs a book such as "RESTful web services" to help make it practical. There are many dispersed standards out there which this books helps bring together. It would have been a great book if it had not missed out the other half of the equation. Luckily this should be easy to fix. And I will do so in the following notes, showing how RDF can help you become even more efficient in establishing your web services. Can it really be even easier? Yes. And furthermore without contradicting what this book says.
Posted at 06:09AM Jun 07, 2007 [permalink/trackback] by Henry Story in General | Comments[9]
James Gosling on Web N
James Gosling had a couple of slides on Web N during his presentation on the Java Platform. Is it "a piece of Jargon" as Tim Berner's Lee is quoted as saying? Well James seems to agree in part with that assessment. It is a lot of hype for what seems to be a very simple thing: just different User Interfaces on ways of storing data on servers. The one consistent similarity of these services, he points out in the next slide, is the way they build communities, using the input of millions to create services that no single organization could have provided.
But in that respect, how does that differ from projects such as Linux, which I was using as my desktop OS in the 90ies? That was a huge piece of engineering developed on the internet, using the web and other tools, in a communal fashion. How does that differ from services such as imdb, the largest online database of films, which I was happily using ten years ago, whose whole content was updated by its users? Is it that the improvements in the web interface are making it easier and easier for people to contribute content? Partly so. If adding photos to a flickr account forced one to fetch a new page for every change, it would be a lot less appealing. But how much then does bandwidth improvements have to do with this? Services such as flickr would have been unbearable in the early web. Certainly YouTube would have gotten nowhere, not even taking into account the difficulty of editing videos on 400Mhz machines. So is Web 2.0 a technical thing, or is it something else?
I'll agree that Web 2.0 is a social phenomenon, in more ways than one. It is a meme that also has a psychological dimension. People who thought that by 2000 they had understood all about the web, the .com aspect, never quite grokking the huge open source wave, those people then declared the Web bubble burst. As more and more amazing things continued happening after the .com bust, they need a way to change their tune without feeling that they had gotten something wrong. Hence Web 2.0. The web just keeps evolving. It's always more than you thought it could be.
Another thought is that if we can trace Web 2.0 all the way back to Open Source programming, then my feeling is that this is where one should look to sow the seeds for Web 3.0. The Open Source community is full of small little Island projects. True they can all exchange code between each other, but the interaction between the groups could be a lot better, just as the interaction between Web 2.0 sites could be. If one could make the interactions between these communities a lot more fluid, then one will certainly be able to unleash a whole new wave of energy. This is why I am so enthusiastic about Baetle, the bug ontology we are developing, which should be an important element in helping open source project work together.
The next generation of the Web is not going to be obvious: how could it be? If it were obvious it would, technical issues aside, already be here. The people most apt to be able to move those technical issues aside, are of course going to be developers themselves. As they see the benefits, these will be distilled into something useful and easy to understand for everyone else.
Posted at 01:13PM Mar 21, 2007 [permalink/trackback] by Henry Story in General | Comments[0]
Java leads in SemWeb Tools
According to Michael Bergman's excellent SemWeb Tools Survey Java is used in 49% of the 500 tools developed for the Semantic Web. A distant second is JavaScript with 13% coverage.
As a long time Java Developer (I went to the first Java One Conference in 1996), this is great confirmation of my initial choice. The energy I have invested in learning Java and its libraries has been amply rewarded. And now with Java being GPLed I feel comfortable that no one will ever be excluded from joining this great community, that these investments are safe, and that great things will be built thereupon.
Part of the reason for Java's success here is that it has delivered on its "write once run everywhere" promise whilst attaining absolutely stunning levels of efficiency. As Jonathan Rentzsch argued so well in Programmers Don't Like to Code, we just want to solve problems. The idea of having to debug something for every other platform feels like a huge waste of energy. Java is therefore a natural choice: a lot more can be done there, a lot larger audience can be reached, with a lot less work.
The Semantic Web is language and platform neutral though, and as the tools survey shows every platform in existence now has a hook into it.
One of my favorite tools is the python based cwm.py, which I use every day to read and query rdf files, and that contains a powerful N3 based rules engine which is pointing to the next step in the development of the Semantic Web.
Another very interesting tool I came across recently is the Lisp based 64bit RDF data store AllegroGraph by Franz. Lisp was perhaps my other favorite language, and I can never completely forget the amazing Lisp machines I saw in the early 80ies which came with color A4 screens which I saw display rotating three dimensional texture mapped spheres on in real time.
But to tell the truth, I think I only know a small fraction of all the tools listed by Michael. There is something for everyone there, and more than any one person can digest. (Unless your name is Michael Bergman :-)
Posted at 11:08AM Mar 12, 2007 [permalink/trackback] by Henry Story in Java | Comments[0]
Metaweb: a semantic wiki startup
O'Reilly groks the Semantic Web in the latest article "Freebase will prove addictive". From his article:
But hopefully, this narrative will give you a sense of what Metaweb is reaching for: a wikipedia like system for building the semantic web. But unlike the W3C approach to the semantic web, which starts with controlled ontologies, Metaweb adopts a folksonomy approach, in which people can add new categories (much like tags), in a messy sprawl of potentially overlapping assertions.
Now that's a very partial simplification. The Semantic Web has always been designed to be grown, though there has been a lot of misunderstanding on this issue as I reported in UFO's seen growing on the web.
The idea of using semantic wikis to grow ontologies is an excellent idea. Seed with a few tags, nourish with plain text, add a little structure with simple ontologies; water; repeat with a littel more complexity at each iteration. With love and attention and a few lullabies the Semantic Web will be born (see Search, tagging and wikis).
A little further he says:
Metaweb still has a long way to go, but it seems to me that they are pointing the way to a fascinating new chapter in the evolution of Web 2.0.
Soon O'Reilly is going to use the word Web 3.0
, just you wait and see!
See also:
- New York Times article on Metaweb
- DBBedia, a Semantic Wiki aggregator.
Posted at 02:39PM Mar 09, 2007 [permalink/trackback] by Henry Story in SemWeb | Comments[4]
Weekend Reading on the future of the web
Here are some meaty articles on the development of the web I have come across recently:
- Testimony of Sir Timothy Berners-Lee Before the
United States House of Representatives
Committee on Energy and Commerce. Tim Berners Lee explains what are the core values of the web, and gives some indication as to where it is going. (video, via Danny Weitzner's blog)
- Nova Spivak Breaking the Collective IQ Barrier -- Making Groups Smarter, a nearly Wired length article on how we are making progress towards Collective Intelligence. As we grow from web 2.0, the read write web, into web 3.0, the data web, into web 4.0, the agent web, we will be able to move from a situation where groups that function intelligently get to be larger, and finally even reach a point where the sum of the individual intelligences is less than the collective intelligence brought about by making these all work seamlessly together.
- Ian Davis explains why this is going to take time, pointing out just how long it took for simple css style sheets to become widely used. This is an argument to be added to my Semantic web: a note on the history of technology adoption.
- Peter Reiser gives a quic






