- All
- Personal
- Programming
- Sun
Henry Story talk about Semantic Web
This past Tuesday I had the good fortune to attend a talk here at USP by Henry Story. Henry is a Sun engineer doing research on the Semantic Web that was here in Brazil (for his brother's wedding, if I'm not mistaken) and happened to run into professor Imre Simon in an unreleated event. And so he was co-opted into talking about semweb in general and his own research for a group of college students and professors.
Leaving transient technical difficulties aside*, the talk went very well, with great turnout for an impromptu event organized in a single day during summer vacations. Henry's current research is on applying semantic web technology to social networks, helping to connect what today are isolated information silos. He demoed an address-book-like application that uses FOAF rdf documents as its native data model, allowing the user to tap into a distributed "social network". For instance, it is trivially easy to access data about friends of your contacts given such data is publicly available as FOAF files.
My very second blog post, way back in 2005, was about how a lot of potentially useful information is hidden inside unstructured files. So it is not surprising that I very much like the promise of semantic web technologies. It is unfortunate that I have not yet had the time/opportunity to dig deeper into the subject, so take the following views with a grain of salt. But, maybe, a newbie perspective is interesting in itself, if not anything else, for the entertainment value.
Henry made a comment that has been reverberating inside my skull since that day, he said "RDF is not just a data format". This is a simple, though crucial point. One of the examples he cited was the scenario where there is an existing predicate, identified by some URI, say http://example.org/has-color, and for "competitive reasons" some company decides to use a different URI for the same purpose, say http://evil.com/is-of-color. No problem, just use another triple to assert that http://evil.com/is-of-color is the same thing as http://example.org/has-color. So, if your app receives its input as rdf, it would just work with documents tagged with evil.com ontologies**. Well, it would just work if the input parser knew how to deal with equivalence predicates. This parser would consequently have to be more complex than our run-of-the-mill marshalling machinery.
This may be controversial, but I would posit that we, as an industry, have developed some sort of orthodox architecture: the famous presentation / application / domain model / persistence layers ***. What's important is the focus on locating all application knowledge in the domain model. Ok, there was a lapse into procedural thinking at the height of EJB1.x period, but now Domain Driven Design is all the rage, object persistence frameworks are incredibly sophisticated, JPA is a success, all is good and well in object-land. Integration is often done with a façade over the domain, exposing service endpoints or resources to the world. Remote invocations get translated into calls to those façades, who unmarshall the received xml data into a domain object graph, often helped by libraries, call relevant operations on those objects, and re-marshall the result. This is obviously simplified, there are many more complex cases where persistence is involved, but it is irrelevant to my point. Yes, I do have a point; what I am tortuously trying to get at is that developers are taught, with good reason, to locate all important semantics in the object model. Bringing semweb into this pictures complicates things, either some important "domain logic" will reside in the data-munging code or it will be integrated with domain model objects in some fashion. But, again with good reason, there is strong resistance among developers to have domain objects depend on anything else but other domain objects. If you doubt me, just hang around any popular java online community for five minutes and watch the permanent confused discussion about DAOs and Repositores and transparent persistence.
I think this is a real impediment to get the semantic web broadly accepted. The solution? beats me. Maybe RESTful web services are a good middle-ground, or a first step, between our current isolated information islands in a see of human-only user interfaces and the Tim Berners-Lee semantic utopia. One of the overlooked aspects of the architectural style is what Sam Ruby and Leonard Richardson call connectedness, and theoreticians like to call hypermedia, every representation should contain links to other resources, and how that allows for painless distribution and looser coupling. Or maybe REST with specialized data formats just won't go far enough, maybe a breed of killer apps will emerge making fantastic use of reasoning capabilities, maybe we will really see autonomous agents proxying our desires online, who knows? I remember just before Google was announced, many analysts were saying search was an unsolvable problem, Alta Vista is the best we could get, and I believed it. So, maybe there is a Semantic Web Google looming.
Getting back to Henry's talk, he mentioned that the best thing the community could do is to build interesting applications. I agree, once the incentive is there and tools are in place, data is going to be generated. He is doing his part very well, with Baetnik, the semantic address book, and also some cool NetBeans plugins for working with DOAP and editing Turtle (a friendly rdf syntax). Off the top of my head, I think a social bookmarks application, a la del.icio.us, would also make for a cool demo. Digging up another old post****, I wrote about how tags (those were the days when everyone was talking about folksonomies) were really just predicates and using some sort of logical database – I was thinking about prolog at the time – would allow for some novel features. Well, RDF and triple stores eat predicates for lunch.
For those readers who understand portuguse, professor Ewout ter Haar also posted his comments about the talk.
* Wi-fi access in our insitute is unstable and bureaucratic.
** Am I using the word correctly?
*** Common variations take presentation (aka view) and application (aka controller) not as separate layers but cooperating subsystems, which is a little more akin to original Smalltalk-80 MVC
**** Again from my first month of blogging, oddly enough.
Posted by rafaeldff [Sun] ( December 14, 2007 07:00 AM ) Permalink | Comments[0]

