free web site hit counter

The Sun BabelFish Blog

Don't panic !

Thursday Apr 17, 2008

KiWi: Knowledge in a Wiki

KiWi logo

Last month I attended the European Union KiWi project startup meeting in Salzburg, to which Sun Microsystems Prague is contributing some key use cases.

KiWi is a project to build an Open Source Semantic Wiki. It is based on the IkeWiki [don't follow this link if you have Safari 3.1] Java wiki, which uses the Jena Semantic Web frameworks, the Dojo toolkit for the Web 2.0 functionality, and any one of the Databases Jena can connect to, such as PostgreSQL. KiWi is in many ways similar to Freebase in its hefty use of JavaScript, and its emphasis on structured data. But instead of being a closed source platform, KiWi is open source, and builds upon the Semantic Web standards. In my opinion it currently overuses JavaScript features, to the extent that all clicks lead to dynamic page rewrites that do not change the URL of the browser page. This I feel unRESTful, and the permalink link in the socialise toolbar to the right does not completely remove my qualms. Hopefully this can be fixed in this project. It would be great also if KIWI could participate fully in the Linked Data movement.

The meeting was very well organized by Sebastian Schaffert and his team. It was 4 long days of meetings that made sure that everyone was on the same page, understood the rules of the EU game, and most of all got to know each other. (see kiwiknows tagged pictures on flickr ). Many thanks also to Peter Reiser for moving and shaking the various Sun decision makers to sign the appropriate papers, and dedicate the resources for us to be part of this project.

You can follow the evolution of the project on the Planet Kiwi page.

Anyway, here is a video that shows the resourceful kiwi mascot in action:

Thursday Mar 20, 2008

how binary relations beat tuples

Last week I was handed a puzzle by Francois Bry: "Why does RDF limit itself to binary relations? Why this deliberate lack of expressivity?".

Logical Equivalence Reply

My initial answer was that all tuples could be reduced to binary relations. So take a simple table like this:

User IDnameaddressbirthdaycoursehomepage
1234Henry Story21 rue Saint Honoré
Fontainebleau
France
29 Julyphilosophyhttp://bblfish.net/
1235Danny AyersLoc. Mozzanella, 7
Castiglione di Garfagnana
Lucca
Italy
14 Jansemwebhttp://dannyayers.com

The first row in the above column can be expressed as a set of binary relations as shown in this graph:

The same can clearly be done for the second row.

Since the two models express equivalent information I would opt aesthetically for the graph over the tuples, since it requires less primitives, which tends to make things simpler and clearer. Perhaps that can already be seen in the way the above table is screaming out for refactoring: a person may easily have more than one homepage. Adding a new homepage relation is easy, doing this in a table is a lot less so.

But this line of argument will not convince a battle worn database administrator. Both systems do the same thing. One is widely deployed, the other not. So that is the end of the conversation. Furthermore it seems clear that retrieving a row in a table is quick and easy. If you need chunks of information to be together that beats the join that seems to be required in the graph version above. Pragmatics beats aesthetics hands down it seems.

Global Distributed Open Data

The database engineer might have won the battle, but he will not win the war [1]. Wars are fought at a much higher level, on a global scale. The problem the Semantic Web is attacking is global data, not local data. On the Semantic Web, the web is the database and data is distributed and linked together. On the Semantic Web use case the data won't all be managed in one database by a few resource constrained superusers but distributed in different places and managed by the stake holder of that information. In our example we can imagine three stake holders of different pieces of information: Danny Ayers for his personal information, Me for mine, and the university for its course information. This information will then be available as resources on the web, returning different representations, which in one way or another may encode graphs such as the ones below. Note that duplication of information is a good thing in a distributed network.

By working with the most simple binary relations, it is easy to cut information up down to their most atomic unit, publish them anywhere on the web, distributing the responsibility to different owners. This atomic nature of relations also makes it easy to merge information again. Doing this with tuples would be unnecessarily complex. Binary relations are a consequence of taking the open world assumption seriously in a global space. By using Universal Resource Identifiers (URIs), it is possible for different documents to co-refer to the same entitities, and to link together entities in a global manner.

The Verbosity critique

Another line of attack similar to the first could be that rdf is just too verbose. Imagine the relation children which would relate a person to a list of their children. If one sticks just with binary relations this is going to be very awkward to write out. In a graph it would look like this.

image of a simple list as a graph

Which in Turtle would give something like this:

:Adam :children 
     [ a rdf:List;
       rdf:first :joe;
       rdf:rest [ a rdf:List;
            rdf:first :jane;
            rdf:rest rdf:nil ];
     ] .

which clearly is a bit unnecessarily verbose. But that is not really a problem. One can, and Turtle has, developed a notation for writing out lists. So that one can write much more simply:

:Adam :children ( :joe :jane ) .

This is clearly much easier to read and write than the previous way (not to speak about the equivalent in rdf/xml). RDF is a structure developed at the semantic level. Different notations can be developed to express the same content. The reason it works is because it uses URIs to name things.

Efficiency Considerations

So what about the implementation question: with tables oft accessed data is closely gathered together. This it seems to me is an implementation issue. One can easily imagine RDF databases that would optimize the layout in memory of their data at run time in a Just in Time manner, depending on the queries received. Just as the Java JIT mechanism ends up in a overwhelming number of cases to be faster than hand crafted C, because the JIT can take advantage of local factors such as the memory available on the machine, the type of cpu, and other issues, which a statically compiled C binary cannot do. So in the case of the list structure shown above there is no reason why the database could not just place the :joe and jane in an array of pointers.

In any case, if one wants distributed decentralised data, there is no other way to do it. Pragamatism does have the last word.

Notes

  1. Don't take the battle/war analogy too far please. Both DB technologies and Semantic Web ones can easily work together as demonstrated by tools such as D2RQ.

Wednesday Mar 19, 2008

Semantic Web for the Working Ontologist

I am really excited to see that Dean Allemang and Jim Hendler's book "Semantic Web for the Working Ontologist" is now available for pre-order on Amazon's web site. When I met Dean at Jazoon 2007 he let me have a peek at an early copy of this book[1]: it was exactly what I had been waiting a long time for. A very easy introduction to the Semantic Web and reasoning that does not start with the unnecessarily complex RDF/XML [2] but with the one-cannot-be-simpler triple structure of RDF, and through a series of practical examples brings the reader step by step to a full view of all of the tools in the Semantic Web stack, without a hitch, without a problem, fluidly. I was really impressed. Getting going in the Semantic Web is going to be a lot easier when this book is out. It should remove the serious problem current students are facing of having to find a way through a huge number of excellent but detailed specs, some of which are no longer relevant. One does not learn Java by reading the Java Virtual Machine specification or even the Java Language Specification. Those are excellent tools to use once one has read many of the excellent introductory books such as the unavoidable Java Tutorial or Bruce Eckel's Thinking in Java. Dean Allemang and Jim Hendler's books are going to play the same role for the Semantic Web. Help get millions of people introduced to what has to be the most revolutionary development in computer science since the development of the web itself. Go and pre-order it. I am going to do this right now.

Notes

  1. the draft I looked at 9 months ago had introductions to ntriples, turtle, OWL explained via rules, SPARQL, some simple well known ontologies such as skos and foaf, and a lot more.
  2. The W3C has recently published a new RDF Primer in Turtle in recognition of the difficulty of getting going when the first step requires understanding RDF/XML.

Wednesday Mar 05, 2008

Opening Sesame with Networked Graphs

Simon Schenk just recently gave me an update to his Networked Graphs library for the Sesame RDF Framework. Even though it is in early alpha state the jars have already worked wonders on my Beatnik Address Book. With four simple SPARQL rules I have been able to tie together most of the loose ends that appear between foaf files, as each one often uses different ways to refer to the same individual.

Why inferencing is needed

So for example in my foaf file I link to Simon Phipps- Sun's very popular Open Source Officer - with the following N3:

 :me foaf:knows   [ a foaf:Person;
                    foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                    foaf:name "Simon Phipps";
                    foaf:homepage <http://www.webmink.net/>;
                    rdfs:seeAlso <http://www.webmink.net/foaf.rdf>;
                  ] .
For those who still don't know N3 (where have you been hiding?) this says that I know a foaf:Person named "Simon Phipps" whose homepage is specified and for which more information can be found at the http://www.webmink.net/foaf.rdf rdf file. Now the problem is that the person in question is identified by a '[' which represents a blank node. Ie we don't have a name (URI) for Simon. So when the Beatnik Address Book gets Simon's foaf file, by following the rdfs:seeAlso relation, it gets among others something like
[] a foaf:Person;
   foaf:name "Simon Phipps";
   foaf:nick "webmink";
   foaf:homepage </>;
   foaf:knows [ a foaf:Person;
                foaf:homepage <http://www.buzzword-compliant.com/>;
                rdfs:seeAlso <http://www.buzzword-compliant.com/foaf.rdf>;
             ] .
This file then contains at least two people. Which one is the same person? Well a human being would guess that the person named "Simon Phipps" is the same in both cases. Networked Graphs helps Beatnik make a similar guess by noting that the foaf:homepage relation is an owl:InverseFunctionalProperty.

Some simple rules

After downloading Simon Phipps's foaf file and mine and placing the relations found in them in their own Named Graph, we can in Sesame 2.0 create a merged view of both these graphs just by creating a graph that is the union of the triples of each .

The Networked Graph layer can then do some interesting inferencing by defining a graph with the following SPARQL rules

#foaf:homepage is inverse functional
grph: ng:definedBy """
  CONSTRUCT { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .  } 
  WHERE { 
       ?a <http://xmlns.com/foaf/0.1/homepage> ?pg .
       ?b <http://xmlns.com/foaf/0.1/homepage> ?pg .
      FILTER ( ! SAMETERM (?a , ?b))   
 } """^^ng:Query .
This is simply saying that if two names for things have the same homepage, then these two names refer to the same thing. I could be more general by writing rules at the owl level, but those would be but more complicated, and I just wanted to test out the Networked Graph sail to start with. So the above will add a bunch of owl:sameAs relations to our NetworkedGraph view on the Sesame database.

The following two rules then just complete the information.

# owl:sameAs is symmetric
#if a = b then b = a 
grph: ng:definedBy """
  CONSTRUCT { ?b <http://www.w3.org/2002/07/owl#sameAs> ?a . } 
  WHERE { 
     ?a <http://www.w3.org/2002/07/owl#sameAs> ?b . 
     FILTER ( ! SAMETERM(?a , ?b) )   
  } """^^ng:Query .

# indiscernability of identicals
#two identical things have all the same properties
grph: ng:definedBy """
  CONSTRUCT { ?b ?rel ?c . } 
  WHERE { ?a <http://www.w3.org/2002/07/owl#sameAs> ?b .
          ?a ?rel ?c . 
     FILTER ( ! SAMETERM(?rel , <http://www.w3.org/2002/07/owl#sameAs>) )   
  } """^^ng:Query .
They make sure that when two things are found to be the same, they have the same properties. I think these two rules should probably be hard coded in the database itself, as they seem so fundamental to reasoning that there must be some very serious optimizations available.

Advanced rules

Anyway the above illustrates just how simple it is to write some very clear inferencing rules. Those are just the simplest that I have bothered to write at present. Networked Graphs allows one to write much more interesting rules, which should help me solve the problems I explained in "Beatnik: change your mind" where I argued that even a simple client application like an address book needs to be able to make judgements on the quality of information. Networked Graphs would allow one to write rules that would amount to "only believe consequences of statements written by people you trust a lot". Perhaps this could be expressed in SPARQL as

CONSTRUCT { ?subject  ?relation ?object . }
WHERE {
    ?g tr:trustlevel ?tl .
    GRAPH ?g { ?subject ?relation ?object . }
    FILTER ( ?tl > 0.5 )
}
Going from the above it is easy to start imagining very interesting uses of Networked Graph rules. For example we may want to classify some ontologies as trusted and only do reasoning on relations over those ontologies. The inverse functional rule could then be generalized to
  PREFIX owl: <http://www.w3.org/2002/07/owl#>
  PREFIX : <https://sommer.dev.java.net/ontologies/beatnik#>

  CONSTRUCT { ?a owl:sameAs ?b .  } 
  WHERE { 
      GRAPH ?g { ?inverseFunc a owl:InverseFunctionalProperty . }
      ?g a :TrustedOntology .

       ?a ?inverseFunc ?pg .
       ?b ?inverseFunc ?pg .
      FILTER ( ! SAMETERM (?a , ?b))   
 }

Building the So(m)mer Address Book

I will be trying these out later. But for the moment you can already see the difference inferencing brings to an application by downloading the Address Book from subversion at sommer.dev.java.net and running the following commands (leave the password to the svn checkout blank)


> svn checkout https://sommer.dev.java.net/svn/sommer/trunk sommer --username guest
> cd sommer
> ant jar
> cd misc/AddressBook/
> ant run
Then you can just drag and drop the foaf file on this page into the address book, and follow the distributed social network by pressing the space bar to get foaf files. To enable inferencing you currently need to set it in the File>Toggle Rules menu. You will see things coming together suddenly when inferencing is on.

There are still a lot of bugs in this software. But you are welcome to post bug reports, or help out in any way you can.

Where this is leading

Going further it seems to me clear that Networked Graphs is starting to realise what Guha, one of the pioneers of the semantic web, wrote about in this thesis "Contexts: A Formalization and Some Applications", which I wrote a short note on Keeping track of Context in Life and on the Web a couple of years ago. That really helped me get a better understanding of the possibilities of the semantic web.

Wednesday Feb 06, 2008

replacing ant with rdf

Tim Boudreau just recently asked "What if we built Java code with...Java?". Why not replace Ant or Maven xml build documents with Java (Groovy/Jruby/jpython/...) scripts? It could be a lot easier to program for Java programmers, and much easier to understand for them too. Why go through xml, when things could be done more simply in a universal language like Java? Good question. But I think it depends on what types of problem one wants to solve. Moving to Java makes the procedural aspect of a build easier to program for a certain category of people. But is that a big enough advantage to warrant a change? Probably not. If we are looking for an improvement, why not explore something really new, something that might resolve some as yet completely unresolved problems at a much higher level? Why not explore what a hyperdata build system could bring to us? Let me start to sketch out some ideas here, very quickly, because I am late on a few other projects I am meant to be working on.

The answer to software becoming more complicated has been to create clear interfaces between the various pieces, and have people specialise in building components to the interfaces. It's the "small is beautiful" philosophy of Unix. As a result though, as software complexity builds up, every piece of software requires more and more pieces of other software, leading us from a system of independent software pieces to networked software. Let me be clear. The software industry has been speaking a lot about software containing networked components and being deployed on the network. This is not what I am pointing to here. No I want to emphasise that the software itself is built of components on the network. Ie. we need more and more a networked build system. This should be a big clue as to why hyperdata can bring something to the table that other systems cannot. Because RDF is a language whose pointer system is build on the Universal Resource Identifier (URI) it eats networked components for lunch, breakfast and dinner. (see my Jazoon presentation).

Currently my subversion repository consists of a lot of lib subdirectories full of jar files taken from other projects. Would it not be better if I referred to these libraries by URL instead? The URL where they can be HTTP gotten from of course? Here are a few advantages:

  • it would use up less space in my SubVersion repository. A pointer just takes up less space than an executable in most cases.
  • it would use up less space on the hard drive of people downloading my code. Why? Because I am referring to the jar via a universal name, a clever IDE will be able to use the local cached version already downloaded for another tool.
  • it would make setting up IDE's a lot easier. Again because each component now has a Universal Name, it will be possible to link up jars to their source code once only.
  • the build process, describing as it does how the code relates to the source, can be used by IDEs to jump to the source (also identified via URLs) when debugging a library on the network. (see some work I started on a bug ontology called Baetle)
  • Doap files can be then used to tie all these pieces together, allowing people to just drag and drop projects from a web site onto their IDE, as I demonstrated with Netbeans
  • as IDE gain knowledge of which components are successors to which other components, from such DOAP files, it is easy to imagine them developing RSS like functionality, where it scans the web for updates to your software components, and alerts you to those updates which you can then test out quickly yourself.
  • The system can be completely decentralised, making it a WEB 3.0 system, rather than a web 2.0 system. It should be as easy as having to place your components and your RDF file on a web server served up with the correct mime types.
  • It will be easy to link up jars or source code ( referred to as usual by URLs ) to bugs (described via something like Baetle ). Making it easy to describe how bugs in one project depend on bugs in other projects.

So here are just a few of the advantages that a hyperdata based build system could bring. They seem important enough in my opinion to justify exploring this in more detail. Ok. Well, let me try something here. When compiling files one needs the following: a classpath and a number of source files.

@prefix java: <http://rdf.sun.com/java/> .

_:cp a java:ClassPath;
       java:contains ( <http://apache.multidist.com/cocoon/2.1.11> <http://openrdf.org/sesame/2.0/> ) .

_:outputJar a java:Jar;
       java:buildFrom <src>;
       java:classpath _:cp .

_:outputJar 
        :pathtemplate "dist/${date}/myprog.jar";
        :fullList <outputjars.rdf> .
If the publication mechanism is done correctly the relative URLs should work on the file system just as well as they do on the http view of the repository. Making a jar would then be a matter of some program following the URLs to download all the pieces (if needed), put them in place and use that to build the code. Clearly this is just a sketch. Perhaps someone else has already had thoughts on this?

Wednesday Dec 19, 2007

Hyperdata in Sao Paulo

In the past week I gave a couple of presentations of Hyperdata illustrating the concept with demos of the Tabulator and Beatnik, the Hyper Address Book I am just working on.

The first talk I gave at the University of Sao Paulo, which was called at the last minute by Professor Imre Simon, who had led the Yochai Benkler talk the week before. It was a nice turnout of over 20 people, and I spoke at a more theoretical level of the semantic web, how it related to Metcalf's law, as explained in more detailed in a recently published paper by Prof. James Hendler, and how an application like Beatnik could give a powerful social meaning to all of this. I also looked at some of the interesting problems related to trust and belief revision that come up in a simple application like Beatnik, which touched a chord with Renata Wassermann who has written extensively on that field of the Semantic Web.
Many thanks to Prof Simon, for allowing me to speak. For a view from the audience see Rafael Ferreira's blog (in English) and Professor Ewout's blog (in Portuguese).

Yesterday I gave a more Java oriented technical talk at GlobalCode, an evening learning center in Sao Paulo, with a J2EE project on dev.java.net. I touched on how one may be able to use OpenId and foaf to create a secure yet open social network.
About 25 people attended which must be a really good turnout for a period so close to Christmas, when everyone is looking forward to the surf board present from Santa Claus, getting into their swimming trunks and paddling off to catch the next big wave. Well the really big wave that everyone in the know will be preparing for is the hyperdata wave. And to catch it one needs to practice one's skills. And a good way to do this is to help out with a simple application like Beatnik.
Thanks to Vinicius and Yara Senger for organising this.

Saturday Dec 15, 2007

James Gosling has a foaf name

And so does Tim Bray, Greg Papadopoulos, Jonathan Schwartz, Sun Microsystems, and Java. All thanks to the great work of the DBPedia people, a loose network of highly skilled distributed self selected avant garde force de frappe, who are extracting all the metadata possible from Wikipedia and making it available as hyperdata, ready to be linked to. :-)

You can browse their information on the web, or with the Tabulator generic data browser which will merge information it finds into one large graph as you explore it. As a result of this I can now add Tim Bray and James Gosling to my foaf file (foaf icon), by adding the following N3 statements:

:me foaf:knows [ = <http://dbpedia.org/resource/James_Gosling>;
                    a foaf:Person;
                    foaf:name "James Gosling" ],
               [ = <http://dbpedia.org/resource/Tim_Bray>;
                    a foaf:Person;
                    foaf:name "Tim Bray" ] .

It is worth looking at how DBPedia works. http://dbpedia.org/resource/James_Gosling is now a Universal Resource Identifier for James Gosling. You cannot fetch James because he is not an information resource, ie, he is not a document, though he is very resourceful, and full of interesting information. You can tell that James is not an information resource because you can't copy him easily. So when you do an HTTP GET on that URI you get the following:

hjs@bblfish:0$ curl -I http://dbpedia.org/resource/James_Gosling
HTTP/1.1 303 See Other
Date: Sat, 15 Dec 2007 17:57:54 GMT
Server: Apache-Coyote/1.1
Vary: Accept,User-Agent
Location: http://dbpedia.org/page/James_Gosling
Content-Type: text/plain
Content-Length: 90

ie you get a redirect to the page about James Gosling. This is because curl by default asks for the html representation of resources. Had you sepecified that you wanted the machine readable rdf/xml representation you would get a redirect to another resource:

hjs@bblfish:0$ curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/James_Gosling
HTTP/1.1 303 See Other
Date: Sat, 15 Dec 2007 18:01:10 GMT
Server: Apache-Coyote/1.1
Vary: Accept,User-Agent
Location: http://dbpedia.openlinksw.com:8890/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=DESCRIBE+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FJames_Gosling%3E
Content-Type: text/plain
Content-Length: 210

Here you get a redirect to a SPARQL query to DESCRIBE James Gosling. To get the full content, in N3 try:

hjs@bblfish:0$ curl -L -H "Accept: text/rdf+n3" http://dbpedia.org/resource/James_Gosling 

the -L flag follows all the redirects...

Tuesday Nov 27, 2007

Jazoon call for papers

Jazoon is a Java conference taking place in Zürich, Switzerland in the summer, June 23 - 26 2008 to be precise. The Jazoon conference committee has sent out a request for papers. My submission last year "Web 3.0, this is the semantic web" was very well attended. I also met some great folk such as Dean Allemang and Harold Carr. It is a smaller conference so you get more time to meet people than at JavaOne, but they also get some famous people to come along, such as Roy Fielding who gave a keynote presentation.

Anyway, the call for papers is up till the end of December. So hurry up to add your contribution.

I will put forward a talk on a Semantic Address Book that I submitted for this year's JavaOne. The good news there is that there have been at least three talks on the Semantic Web put forward for J1 this year that I directly know about, and indirectly I believe there are a lot more. I really hope that at least three of them will get accepted, which would make for a nice SemWeb track. Hey, there were two talks on the subject at Jazoon! There should be at least the following talks in my opinion at J1:

  • a talk to introduce the semantic web and reasoning - even if this touches only slightly on Java
  • a talk explaining how a real hyperdata application was written in Java
  • a panel on the current state and the future of semantic web tools on the Java platform.

I suppose I should also put something forward for JavaPolis... Anyone know what the deadlines are there?

Friday Nov 02, 2007

Vote for Java6 on Leopard!

As mentioned previously a lot of Java developers on OSX are upset at Apple's silence as to its intentions with respect to the release of Java 6. There used to be a developer preview available, which was pulled recently with no indication as to when a replacement would be available. People like me who upgraded in the hope of having the latest and greatest - which we have been very patiently waiting for over a year for - are very disappointed. It creates all kinds of annoyances, like not being able to run Java Tutorial examples. Some who are working on Java 6 projects cannot use their computer easily, without resorting to installation of a separate OS in a virtual machine, to do their job. We all like OSX: its a beautiful easy to use Unix that usually really helps us get our work done. I have been very happily using it since 2004.

The first solution of course is to have our voice heard. One way to do this is to file a bug with Apple. Please do this! The only problem I have with it is that as opposed to the Java bug database which is completely open, the Apple bug database is completely closed. So there's no real way of verifying how many people have posted a report. We must therefore complement that action with an equal Open action. Following the noble example given to us by Nova Spivack, when he asked for people to make their voice heard in support of the Burmese people and got some real results, let us do the same to help Apple make the right decision.
Anybody who would like to support this issue in the blogosphere, should help post a blog with the string

13949712720901ForOSX

The first part of the string is the decimal notation for 0xCAFEBABE [1], the magic cookie for JavaClass files (thanks David for the number and the pointer to Fredericiana's photo). Then post similar instructions on your blog or point people here. Let's see how far this gets us! [2]

We should then be able to use any search engine, Google is a good choice, to search for this string [3], and hopefully motivate the managers at Apple to invest more time on Java and be more open about their plans with the community.

Your vote may also be an energizer to those groups that are starting to port the OpenJDK to OSX (via the mac java community).

Notes

  1. Oops I just noticed a mistake here. 13949712720901 in dec = 0xCAFEBABE405 in Hex. Even better. So that's CAFEBABE + the HTTP 405 Response, which means "Method not available". :-)
  2. If you know a foreign language then please translate the instructions and explanations so that more people can understand what is going on. Always post a link to some instructions. Language is a Virus, but it is most virulent when it is understandable and hyperlinked, of course.
  3. A search on Google Web returns more results - more than AllTheWeb or AltaVista - but Google Blog Search contains less duplicates. The real number of votes is somewhere between those two numbers, as some people are voting on their open source web sites, which are not always feed enabled. Simon is keeping count.
  4. Karussell is keeping a list of related articles.

Update

Tuesday Nov 13: Landon Fuller has been able to get a very nice hello world GUI app running on OSX using the FreeBSD jdk1.6 port. It runs under X Windows only. Excellent work!

Nov 20th, 2007: Dave Dirbin publishes the first beta release of the open source java 6. This campaign has gathered 105 blog votes if we count the results from Google Blog Search, placing it easily among the top 10 bug reports at the Java Bug database. The Google web search returns 256 results, which will contain the blog search, many duplicate pages pointing to blogs + some extra votes people may have placed on the web. I guess that those extra votes may pop this bug report up to the top 5 position.

Wednesday Dec 19: Apple has put a developer preview of Java 6 up on Apple Developer Connection. It is nice to see things progress on that side. As a result of this conflict, Java development on OSX has become a lot richer, with an open source JDK starting to compete with the closed one from Apple. This can only be good for both, and for developer and customer confidence in the platform.

Saturday Oct 27, 2007

Echo2: building Web 2.0 in Java

During some heated discussion on the Apple Java Dev mailing list as to why there still is not Java 6 available in Leopard, with some serious calls to port OpenJDK to OSX, Will Gilbert pointed to the Echo2 demo by Nextapp, which is written with a Java™ library. Here is what Will had to say:

I eventually found little known framework which blew my socks off with performance and seduced me with the source code I would be writing. It was Echo2 from NextApp.com. Fully open source and free. There is nice demo at:

http://demo.nextapp.com/Demo/app

Use the accordion pane to select the "Technology" panel, the click on the "Java Development" button you can see the source code, then run the app which is defined by the source. You will see an immediate similarity to AWT with a flavor of Swing.

I don't know if you have already committed to a development platform, if not, check out Echo2. More specifically take a look at the Echo2 fork which was done by some guys in Australia called Cooee at Karora.org. They took the Echo2 source, added a nice bug reporting system (JIRA), scheduled releases, and Maven repository support.

I you have Leopard, you now have Maven installed. You can run a archetype which I've been developing to create a runnable demo application. This archetype is underdevelopment and is likely to change.

From terminal run the following. Maven will install what it need and then create the application

% mvn archetype:create \
-DarchetypeGroupId=org.karora.cooee.sandbox.informagen \
-DarchetypeArtifactId=webapp \
-DarchetypeVersion=1.0.0 \
-DremoteRepositories=http://informagen.com/maven2 \
-DgroupId=com.foo.bar \
-DartifactId=xxxxx
   .
   . [INFO] BUILD SUCCESSFUL
   .
% cd xxxxx
% more readme.txt

Anyway, that is a really nice discovery that got my mind off my disappointment with the unavailability of even a preview of Java 6 on Leopard. This is an OSX release that otherwise has many very nice features. Here are some of my initial impressions:

  • The user interface is snappier
  • Spotlight finally works (It used to take forever to find things, and the user interface was terrible).
  • The Finder is very much improved. The nice big graphics previews helped me find a few videos I did not know I had. With the click of a space you can see the first page of a NeoOffice document. (NeoOffice is a free and excellent Office suite with the User Interface written in Java btw)
  • Spaces, the multi windowing environment, is dead beautiful, but broken. You can drag one web browser window from one virtual desktop to another with one simple gesture, but then when you want to switch between a number of web browser in different windows using alt-tab you usually end up in the original window. Using Expose keys does not help that much either. So you have to have all windows from one application on the same virtual window. You also don't get much choice for selecting your short cut keys, which means that you may have trouble with it interfering with other applications.
  • Netbeans 6 beta 2 works very well on Leopard. There have been some improvements to the graphics library in their new Java release which has broken IntelliJ and Eclipse though. NetBeans 6 by the way is getting to be really really good.
  • Matt Neuburg post some well illustrated criticisms on the new features in Leopard, which I agree with. The last point about the help windows being impossible get out of the foreground is, it is true, quite bizarre.
  • Some minor and not so minor bugs in The lame Leopard blog

But the final and best review is as usual John Siracusa's Ars Technica review. No marketing hype. Real facts, and good criticism.

Wednesday Oct 17, 2007

Java for iPhone by Feb 2008?

I just received this email as a response to a bug I had filed at Apple concerning Java's non presence on the iPhone.
(Hmm... Reading the message closely it does not say "Java" though, it just mentions an SDK... I wonder what kind of Software Development Kit they mean?)

Subject: Re: Bug ID 4918928: java on iphone

Dear Henry,

This is a follow-up to Bug ID# 4918928 . Apple has just announced via Apple HotNews an iPhone SDK will be made available to developers in February 2008.

------------------- http://www.apple.com/hotnews/ [no permalink it seems]

Third Party Applications on the iPhone Let me just say it: We want native third party applications on the iPhone, and we plan to have an SDK in developers’ hands in February. We are excited about creating a vibrant third party developer community around the iPhone and enabling hundreds of new applications for our users. With our revolutionary multi-touch interface, powerful hardware and advanced software architecture, we believe we have created the best mobile platform ever for developers.

It will take until February to release an SDK because we’re trying to do two diametrically opposed things at once—provide an advanced and open platform to developers while at the same time protect iPhone users from viruses, malware, privacy attacks, etc. This is no easy task. Some claim that viruses and malware are not a problem on mobile phones—this is simply not true. There have been serious viruses on other mobile phones already, including some that silently spread from phone to phone over the cell network. As our phones become more powerful, these malicious programs will become more dangerous. And since the iPhone is the most advanced phone ever, it will be a highly visible target.

Some companies are already taking action. Nokia, for example, is not allowing any applications to be loaded onto some of their newest phones unless they have a digital signature that can be traced back to a known developer. While this makes such a phone less than “totally open,” we believe it is a step in the right direction. We are working on an advanced system which will offer developers broad access to natively program the iPhone’s amazing software platform while at the same time protecting users from malicious programs.

We think a few months of patience now will be rewarded by many years of great third party applications running on safe and reliable iPhones.

Steve

P.S.: The SDK will also allow developers to create applications for iPod touch.

Monday Oct 15, 2007

Java Picture Editor

Over the past year I have found myself again and again using the free Java Image Editor from JH Labs, a company that seems to be doing a lot of interesting work in the java graphics area.

ImageEditor, starts much quicker than the Gimp, does not require X11, and has a modern look, as you can see from the snapshot I made, which integrates beautifully in OSX. It comes with a lot of filters, which are available in source code under an Apache licence. ( I have not really used them, to tell the truth)
Today I have been using it for the very simple task of editing a foaf icon. I had tried for hours to get X11 going after it got clobbered by an installation I made using fink, the open source distribution platform for OSX. What I now have keeps crashing after a few minutes of use. This is where Image Editor came to the rescue.

ImageEditor has quite a few limitations, other than it not being fully open source. I don't think it has more than one step undo for example. And some of the menus are a little obscure.

But what I really like about it is that it shows that you can build applications in Java that look as good as native OSX apps.

Friday Oct 05, 2007

Doap Bean available

I have just made the NetBeans Doap Bean available on the plugin portal. Just download onto your desktop and install in a version of NetBeans 6 (check Tools < Plugins in the menu)

This is the module I demonstrated at James Gosling's 'fun things' presentation on NetBeans day in San Francisco. I have updated the code to make it easy to understand for people who would wish to emulate and enhance it. It is easy to do that. Install the plugin, and go to the https://sommer.dev.java.net/ project. Then drag the blue button next to the URL

from your browser (I have checked that it works with Safari and Firefox on OSX) onto the DOAP button on the toolbar. This will fetch the information from the web page and pop up a window with a human readable representation of the RDF. This window should look like this:

window describing the so(m)mer project

Clicking on the other tabs will show you the original RDF/XML or an easier to read Turtle representation of the data. It is really important to show these tabs so that you can distinguish good from bad doap. Of course one can also go to the W3C Validator for an independent opinion.
In any case if the source code is available via a CVS or Subversion repository, you should be able to download it with just the click on the "download" button. (Make sure that NetBeans knows where your svn command line tool is though, by going to the menu Versioning &gr; Subversion > Checkout... )

If you want to try dropping other projects onto the button go to DoapSpace, they have put together a large collection of doap files for all the projects on SourceForge, Freshmeat and PyPi.

As I mention this is really only version 0.1 of the doap integration of Netbeans. Clearly one could do a lot more, such as:

  • Having it produce Doap for a project automatically
  • Tying it into NetBeans's Project panel
  • describing the relationships a project and others it depends on
  • Linking bug reports to information gleaned from the doap:bugdatabase relation
  • Perhaps see if one can set things up so that one can immediately find the javadoc online for a doap project one has information of
  • find a way to view source on a jar, by relating jars to source code repositories... (more difficult this one)
  • and a lot more...

Now you may wonder: How is one going to know that there is a doap link on some project's source page? Searching for the doap link seems a lot of work, right? Well to get an idea of how things will integrate you can install the Firefox Semantic Radar plugin, and go to the So(m)mer project again. You will then see displayed at the bottom of your browser an icon of square smiley faces, as shown on the following screenshot

semantic radar icon in Firefox

I should probably add this icon to the Doap button come to think of it...
The Doap button is in the So(m)mer repository, which is all published under the very generous BSD licence, so you are welcome to help out and add your own features... I may be having to work on a few other things next, so I won't be getting in your way :-)

Tuesday Sep 18, 2007

M2N: building Swing apps with N3

At the Triple-I conference in Graz, I came across one very interesting demo by the M2N Intelligence Management company, where they showed a Development environment powered by N3, the powerful, easy to read notation for the Semantic Web. Using a Visual Editor that mapped UML diagrams to N3, instead of the usual limited and difficult to understand OMG MOF family of standards, we could see how one could build a complete User Interface application, including logic in a visual way. The same could be done manually in vi by editing the N3 directly, for those proficient enough. I think they describe parts of this very generally on their solutions page.

It is a pity that M2N does not open source this library, as that would allow one to get a better idea as to the advantages of doing things this way. Sebastian Schaffert - who works at a research company in Salzburg, and was looking at their demo with me - was quite enthusiastic about the idea. There was a lot one could do with such a tool, he thought, such as being able to SPARQL query one's user interface, test it for constraints, etc...

It would be nice to have some feedback from people who had used this on the pros and the cons of their implementation, or of the general idea.

Monday Sep 10, 2007

The Church of NetBeans

If there is anyone else who is close to being as homeless as me at Sun, it is certainly Tim Boudreau, who is now on a World Tour in a truck he bought for $1000. As he is the ultimate NetBeans evangelist, he painted up his car with the NetBeans logo, and will evangelise to whomever wants to hear the word :-) Read up on his story on his blog.

Tim is also a creative guitar player and song composer so don't hesitate to ask him to play you a song.
This reminds me of Timbuk 3's song "Reverend Jack and his Roamin Cadillac Church" (iTunes).

Come hell or high water
A soul's got to find some release
Some find it in power
And some in heavenly peace
Some look to the preacher
As he speaks from his holy perch
Me, I back Rev. Jack & his Roamin Cadillac Church
So if you're stuck at the station
On the road to the Glory on High
If you need some inspiration
He's got more than your money can buy
If you're lookin for salvation
Well my friend it's the end of your search
Here comes Rev. Jack & his Roamin Cadillac Church
Ain't no use watchin the road, son
When you ride in his automobile
Cause we're all back seat drivers,
& there's nobody at the wheel
Now for the well-to-do doctor
There's a home & a summer retreat
And for the jet-settin banker
There's a place in the social elite
But for the poor & the hungry
All the lost souls left in the lurch
There's just Rev. Jack & his Roamin Cadillac Church

Tuesday Aug 28, 2007

My Bloomin' Friends

Closed Social Networks are blossoming all over the place. They provide a semblance of protection, at a price: lock in. Locked into the social network provider you get convenience in the form of tools to make conversation easier (video, email, chat boards, ...), some form of privacy protection (if you trust the provider), introductions to 'like minded' people, and other niceties.

Some of us work in the open air: we have to set standards in public view; we stand by what we say; we accept criticism from wherever it comes; and we can't choose our friends based on their social network provider. We describe ourselves in our foaf files where we can specify what we do, how to contact us, our interests, and links to who we know by pointing to their Universal Identifiers. There is no trouble linking between people who are open in this way. We are happy to reference each other: it strenghtens the exposure of our work and the quality of the web. This is how I link to Paul Gearon:

:me foaf:knows  [ = <http://web.mac.com/thegearons/people/PaulGearon/foaf.rdf#me>; 
                  a foaf:Person;
                  foaf:name "Paul Gearon" ] .
I could just point to his URL, but the little extra duplicate information can make life easier for people/robots browsing the data web. It can help people notice inconcistencies and help me correct them.

But not everyone lives in the open the same way, and not everyone wants to make the same amount of information about themselves public. There are a number of different ways to deal with this. I want to discuss a few of them here.

Content Negotition

How much someone says about themselves is up to them, and so is how they protect their information. The same URL that identifies someone, could return more or less information depending on who is asking. I could set up my foaf file so only friends who log in via openid can see my friends. Others would just get default information about me. I could be even more clever. I could allow any friend of my friend who logs in via their openid to see my full foaf file; others would see information about me, and a select group of open friends. Closed Social Networks could open up by making it convenient to specify these policies, and providing the right infrastructure to do so.

Indirect Identification

By directly identifying someone via a URL (as I do) we can leave a lot of the policy of what they make visible up to them. But those that don't have a foaf name, need to be identified indirectly. We can do that by identifying them via some property such as their blog, their home page, their email address, or their openid. I am very open about my email addresses. They are published and visible to all.
 <http://bblfish.net/people/henry/card#me>     <http://xmlns.com/foaf/0.1/mbox> <mailto:henry.story@bblfish.net> .
I value it more that people can contact me easily - living as I do in the middle of nowhere and often living nowhere in particular - than the pain of spammers. Too many people are lazy about security, using virus filled Windoze computers, obvious passwords, cracked software for me to be under any illusion that hiding my email is going to prevent the bad guys from getting it.

However I can't assume that everyone else will accept me applying this argument to their email address. For this there is a nice mathematical technique: I can encrypt their email address using the SHA1 hash function. This create a close to unique string that cannot be dissasembled. You cannot go from the sha1 sum of an email address back to the email. But you can always calculate the same sha1sum from an email. This is how I identify Simon Phipps, Sun's Open source Officer:

:me foaf:knows [ a foaf:Person;
                 foaf:mbox_sha1sum "4e377376e6977b765c1e78b2d0157a933ba11167";
                 foaf:name "Simon Phipps";
                 rdfs:seeAlso <http://www.webmink.net/foaf.rdf>
               ].

If you know Simon's email, then you will know that I know him. "What use is that?" I can hear someone ask. It's all about Working with People on the Internet. Imagine you are reading email on a newsgroup with a foaf enabled mail tool linked to a foaf enabled Address Book (such as Beatnik). You come on an email by Simon saying something interesting about how Sun has changed its stock ticker to JAVA for example. My logo and perhaps that of a couple of other people appears on the mail reader in a way that indicates to you that we know Simon. The post is no longer anonymous for you, and so has more trust value. You feel part of a community.[1]

So spammers can not use that information to spam. Either they already know your email address, and so they are probably already spamming you, or they don't, and this won't help them. They can only [2] learn about social network claims: who claims to know who. They could use this, it is true to introduce themselves as an aquaintance of a friend of yours. A bit of a risky strategy that could quickly get them on a black list. Currently being black listed may not be an expensive proposition. But in a cryptographic web of trust this will be both much easier to notice, and more damaging for the infringers.

Fuzzy Identification

I can directly and indirectly identify a lot of people in my Address Book as described above. This is perfectly acceptable for people who have an open life, like I do, and a large portion of the Open Source community, bloggers, standard setters, etc... But on last count I had over 700 people in my AddressBook. It is a lot of work to identify all of them individuall, and to decide how much visibility I should give them. I may not even want people to know how many people I know this way. Also I may want deniability: there are people one may know, but one may not want to highlight that, and one may want to be able to deny that one knows them to some people. The foaf:sha1sum gives me a way to identify someone, but if some nozy person comes to me and asks me about that person's life after having identified the corresponding email address, there is no escape route other than refusing any conversation, which by itself can easily be taken to be significant. What we need is a way to fuzily identify a group of one's aquaintance.

Bloom Filters

This is what Bloom Filters enable one to do. Originally used in times when memory was expensive, they allowed the whole vocabulary of a language to be condensed into a reasonably short string. Here we can use it to group all the email addresses of our friends together in one opaque string. I could express as follows in RDF (bear in mind that the rdf vocabulary has not been settled on):
:me foaf:bloomMbox [ a bloom:Bloom;
                     bloom:base64String """"
            IAOgQgSAAAICCAADAoQgDABAAiQKgIABgyAIBEhAAAAIUKBACCYAABAAaEkGQAGIEAHRUAgAAQUw 
            hCgwACJNQxQAAggAgCIgAAAAKgICEKAAAABCQiB0JCAAAIkgDASAYiAAAEIQAAIAABDCEAZACOpA 
            ICEEMAGAEGEAxIA=""";
                     bloom:hashNum 4;
                     bloom:length 1000 ] .

Given the above Bloom someone can query it with an email address using the inverse algorithm and the Bloom will answer either that I may know that person, or that it can't tell. The loaf project explains some of the advantages of having this in more detail.

The best way to get a feel for how it works is to try it. Here I have written a little java applet [3] that allows you to test my Bloom for people I know, and to create your own bloom [4].

Your browser is completely ignoring the <APPLET> tag! Go to java.com to download the latest.

Some emails you can try with positive results are tbray attextuality dot C O M, or bill at dehora dot net (suitably transformed of course). The applet lowercases all email addresses when creating and when testing the bloom.

To create your own bloom just click the "Create Bloom" tab. An easy way to extract all your email addresses from an OSX Address Book is to run the following on the command line:

hjs@bblfish:0$ osascript -e 'tell application "Address Book" to get the value of every email of every person' | perl -pe 's/,+ /\n/g' | sort | uniq | pbcopy

You should now be able to paste the list of all your contacts in the applet. To restrict the Addresses to on of your groups named "foaf" for example replace the relevant section above with tell application "Address Book" to get the value of every email of every person in foaf.

You will need to choose the number of hashes and the maximal size of the bucket you wish to fill. The greater the number of hashes and the greater the size of the bucket, the more precision you get and the less deniability.[5]

Conclusion

None of the above tools are by themselves the complete solution for creating an Open Social Network that will satisfy everyone. But for people willing to live in the open, the correct and astute use of them should satisfy most of people's requirements. Access Control on URLs can make it possible to reveal more or less information depending on who is looking; indirect identification can allow one to name people even without direct identification; sha1sums allows one to partially hide sensitve identifying information; and Blooms allow one to make fuzzy statements of set membership. All of these can be combined in different ways. So one can make statements about sha1sum identified people on the open web, or one can do so behind an access controlled file that only friends logged in with OpenId can see. There are bound to be more fun things to be discovered here. But this should make clear just how much can be done in this space.

Notes

  1. For the link from email addresses to sha1sums to work, it helps to canonicalise the emails to all lowercase. This should probably be made more explict in the foaf:mbox_sha1sum definition.
  2. "They can 'only' learn about social network claims", is quite a lot more than some people are willing to accept. See the article by Mark Wahl "Organizing principles for identity systems: Attacks on anonymized social networks and fudging oracles" which contains some very good pointers. For people who want to retain complete anonymity, and this is what people subscribe to when they answer public surveys, any leakage of information is too much leakage. The problem is that because of Metcalf's Law it is nearly impossible to stop information combining itself: Information wants to be linked. So I think, when we are not tied to stringent laws, we should accept this rather than fight it, and use it to our advantage when hunting down spammers: the law holds for them too.
  3. You can get the source code for the applet on the so(m)mer repository in the misc/Bloom subdirectory. I used the pt.tumba.spell.BloomFilter class which I adapted a little for my needs. This was just the first one I found out there. It is probably not the most efficient one, as it uses an array of booleans, when it could use an byte array. If you know of other libraries please let me know.
    The code was put together really quickly and may well contain bugs. Feedback and patches and contributions are welcome.
  4. the advantage of Java Applets over server side code is really obvious here:
    • I don't need a server with a fixed port number to show you this
    • someone can't easily start a denial of service attack to bring the server down
    • You email addresses never leave your computer, so there is no fear of loss of privacy.
    On the last point it would be nice if browser vendors made it easier to get info about the exact restrictions a Java Applet had. I would like to be able to click on an Applet and verify or set it to "no network communication whatever". This would increase trust even more in cases like this.
  5. More info on the load site. Apparently one needs more than 1/4 deniability if one is to preserve some measure of privacy, according to the paper "the price of privacy and the limits of LP decoding" by Cynthia Dwork, Frank McSherry and Kunal Talwar (Microsoft Research) who suggests that
    ... any privacy mechanism, interactive or non-interactive, providing reasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.
    Thanks again to Mark Wahl for these references.
  6. Thanks a lot to Dan Brickley for working together with me on this last Friday, and pointing me to many of the important work done here. Dan also wrote a little python script to do something similar. Some of the sites I came across during our discussion: Not having studied bloom filters in detail, I am not sure how compatible the blooms of each of these libraries are. The super simple ruby bloom library does not seem to specify the number of hashes that were used to create a Bloom.
  7. Nick Lothian reminded me in a comment to this that he has written a Bloom Filter demo for facebook. I don't have a facebook account (because I am already on LinkedIn, and I can't really be bothered to move all my information, and because I don't like closed networks), so I was not able to use it. Perhaps I should get a facebook account just for this... Let me know.

Thursday Jul 12, 2007

java on the iPhone

According to Ed Burnette' misleadingly entitled post "Apple sneaks Java support onto the iPhone", a java virtual machine named Jazelle runs natively on the CPU that the iPhone is made from, and this feature is enabled on the processor. Apparently it is very small and very efficient, blatantly contradicting Steve Jobs' comments:

Jobs: “Java’s not worth building in. Nobody uses Java anymore. It’s this big heavyweight ball and chain.”
Java is available on every cell phone except his pretty much, on nearly every computer shipped, on robots, and credit cards... Presumably because nobody uses it. And now we find he would not even have to build it into the iPhone, as it is already written for that cpu - well perhaps Apple would have to do some work on the graphics libraries.
Perhaps it's not surprising that he would think this, given that he is surrounded by ObjectiveC programmers. On the other hand I have heard an interesting argument that this may be a way to entice various providers to start creating video streams in h.264 format...
Myself, I won't see the point of having such a phone if I can't have a good version of Java on it that is usable. I can wait.

Monday Jul 02, 2007

refactoring xml

Refactoring is defined as "Improving a computer program by reorganising its internal structure without altering its external behaviour". This is incredibly useful in OO programming, and is what has led to the growth of IDEs such as Netbeans, IntelliJ and Eclipse, and is behind very powerful software development movements such as Agile and Xtreeme programming. It is what helps every OO programmer get over the insidious writers block. Don't worry too much about the model or field names now, it will be easy to refactor those later!

If maintaining behavior is what defines refactoring of OO programs - change the code, but maintain the behavior - what would the equivalent be for XML? If XML is considered a syntax for declarative languages, then refactoring XML would be changing the XML whilst maintaining its meaning. So this brings us right to the question of meaning. Meaning in a procedural language is easy to define. It is closely related to behavior, and behavior is what programming languages do their best to specify very precisely. Java pushes that very far, creating very complex and detailed tests for every aspect of the language. Nothing can be called Java if it does not pass the JCP, if it does not act the way specified.
So again what is meaning of an XML document? XML does not define behavior. It does not even define an abstract semantics, how the symbols refer to the world. XML is purely specified at the syntactic level: how can one combine strings to form valid XML documents, or valid subsets of XML documents. If there is no general mapping of XML to one thing, then there is nothing that can be maintained to retain its meaning. There is nothing in general that can be said to be preserved by transformation one XML document into another.
So it is not really possible to define the meaning of an XML document in the abstract. One has to look at subsets of it, such at the Atom syndication format. These subset are given more or less formal semantics. The atom syndication format is given an english readable one for example. Other XML formats in the wild may have none at all, other than what an english reader will be able to deduce by looking at it. Now it is not always necessary to formally describe the semantics of a language for it to gain one. Natural languages for example do not have formal semantics, they evolved one. The problem with artificial languages that don't have a formal semantics is that in order to reconstruct it one has to look at how they are used, and so one has to make very subtle distinction between appropriate and inappropriate uses. This inevitably ends up being time consuming and controversial. Nothing that is going to make it easy to build automatic refactoring tools.

This is where Frameworks such as RDF come in very handy. The semantics of RDF, are very well defined using model theory. This defines clearly what every element of an RDF document means, what it refers to. To refactor RDF is then simply any change that preserves the meaning of the document. If two RDF names refer to the same resource, then one can replace one name with the other, the meaning will remain the same, or at least the facts described by the one will be the same as the one described by the other, which may be exactly what the person doing the refactoring wishes to preserve.

In conclusion: to refactor a document is to change it at the syntactic level whilst preserving its meaning. One cannot refactor XML in general, and in particular instances it will be much easier to build refactoring tools for documents with clear semantics. XML documents that have clear RDF interpretations will be very very easy to refactor mechanically. So if you are ever asking yourself what XML format you want to use: think how useful it is to be able to refactor your Java programs. And consider that by using a format with clear semantics you will be able to make use of similar tools for your data.

Thursday Jun 28, 2007

Jazoon: Web 3.0

Well over 100 people attended my Web 3.0 talk at the Jazoon conference in Zurich today, which covered the same topics as my JavaOne BOF (slides here). You can count them here in the picture which I took at the end of the talk.

The Jazoon conference is 1/17th of the size of JavaOne, so the attendance numbers were tremendous. As a comparison I had 250 people attended the JavaOne BOF. Had the same percentage of the JavaOne conference attended my talk, I would have had an audience of 110*17=1770!

I had just about time to cover the slides in the 40 minutes allocated to me, so it was really great to have a follow on question and answer session which at least a third of the people remained for. Dean Allemang (blog) shared the space with me on the Q&A, and was able to bring his vast experience to bear. The attendees were thus able to get a quick overview of TopBraid Composer which Dean presented quickly in response to a question on tooling. Questions on security popped up, which allowed me to speak a little more about RDF graphs and quads, essential pieces of the Semantic Web story.

Wednesday Jun 27, 2007

Jazoon

Roy Fielding gave his very well attended keynote presentation today (Tuesday 26) at Jazoon, the new Java developers conference taking place for the first time in Zurich this week. Coming here just to hear Roy talk was worth the whole trip in itself.

This is the first year of Jazoon, and yet the venue was able to attract over 800 developers (I am not sure of the exact number), which bodes well for its future. So to have close to 10% of the attendees (photo) come to Dean Allemang's talk "Semantic Mashups using RDF, RSS and microformats" was a very good surprise. Dean, who is working for TopQuadrant producers of the Eclipse based TopBraid Composer, is not just a very good presenter, but also a very knowledgeable Semantic Web evangelist. He gave Harold Carr (blog) and others a demo (photo) of TopQuadrant, that started up outside the conference room, moved down into the bar at the entrance (photo), and as it kept being interrupted by great side tracks into Philosophy, Jungian psychology (Jung of course worked in Zurich), Semantic Web company adoption, Literature, Mathematics, Religion, sexual politics, and so much more, that the demo only came to a tentative conclusion around 1am in a bar in the center of Zurich discussing the relations between REST and RDF and how this differed from SOAP. (For Dean's impressions of Jazoon, see his "Swiss Java" blog post.)

My talk, "Web 3.0: This is the Semantic Web" will be taking place on Thursday at 11am. I will be going into more technical details, looking at the foundations of the Semantic Web step by step. As a surprise I may even be able to get a slot for Dean to present his TopBraid composer, which is not just a Ontology editor, but also a complete mashup environment.

Time for me to go to sleep!

Search

Flickr Diary