P@ Sunglasses

« This snake skin... | Main | ROMA LOCUTA EST.... »

20050110 e hënë janar 10, 2005

Java Syndication Babel: let's paint the picture together! babel by Brueghel

When we started the ROME (Rss and atOM utilitiEs) project last may, we had done a preliminary review of existing syndication libraries in java, and none of them suited us: we try to avoid the Not Invented Here syndrom as much as we can:-)

The other day Kevin Burton commented about our Xtech 2005 article submission about Rome that he would submit a paper as well about Jakarta Feedparser. We were not aware of this project when we started Rome: it's hidden in the bowels of the Jakarta Commons Sandbox (I think I did a quick search on apache rss at that time and found nothing meaningful: Google makes you lazy!). Too bad because it looks pretty good. And this is not surprising because Kevin is the guy who created this very innovative aggregator called NewsMonster a few years ago (what i thought about it in my very old weblog, and my old weblog), and started a more promising online aggregator implementing the same ideas (collaborative filtering, reputation metrics, tagging) a few months ago: Rojo. According to the project's mailing list, Feedparser is based on NewsMonster's libs, and is what they use at Rojo.

A cursory look at Feedparser makes me think that our architectures are quite different: they seem to have an event based approach, defining SAX events at a higher semantic level (like SyndFeed and SyndEntries in ROME, but based on events at parse time), while in ROME we handle you Java objects once the parsing is over (more like a DOM-based approach). But even if the architectural approaches are different, they have a few goodies that could be useful for us, such as nice autodiscovery classes to guess where the feed associated to a page is, based on the system used to generate the feed (if people omit the RSS or Atom autodiscovery link in the template). And they could benefit from a few things in ROME, such as the fetcher, and Modules.

Also in the few libraries that we studied before deciding to create ROME, only Informa seems alive and kicking.

Joseph Ottinger who created RSSLibJ switched to ROME when we shipped 0.1, and is now part of the ROME developers, with very active contributions to the design debates we have in the mailing list. And Jason Bell decided to stop development on RSSLibJ recently and use ROME.

If you're doing syndication in Java today, you have the choice among these 3 libraries. Kevin's comment arrived at a good time: Alejandro and I are preparing a submission for a technical talk at JavaOne this year. And we're also beginning to think that for all future work on Syndication utilities (extensions, applications, rdbms mapping, etc...) it would be really cool to be able to leverage a common java representation of feeds and entries, instead of having your development tied to the library you have chosen... you see where I'm going don't you?

What I propose is to submit a joint paper to JavaOne this year, where the 3 teams will present their project, and to work together to determine where we are different, and similar. I think having a clear picture of what the various libraries' strength, weaknesses and designs are would be much more worthwhile for attendees than a single one (and I would be the first interested in such a presentation).

Then take advantage of JavaOne to host an open meeting to discuss the potential convergence points of our projects, and wether it is worth standardizing them in a JSR, something like a Java Syndication API. One of the reasons why I think such a JSR could be useful is that with the new Atom publishing API soon published, it would be good to have a common java API to get and set feeds. Dave Johnson plans to add a new ROME subproject for Atom publishing where yet a few Interfaces will be defined for a blog, blogsite, BlogEntry. It would be nice if all of our libraries would agree on common java interfaces for this kind of data so that they are interoperable.

Kevin, Niko, Pito, Joseph, DaveJ, Alejandro, what do you guys think? If you all agree, the deadline for the JavaOne call for paper in January 31st, so we have 2 weeks to produce a common outline.

( Jan 10 2005, 06:24:05 PD PST ) Permalink Comments [9] Chat about it Technorati cosmos Tagsurf It

Comments:

It sounds like you're suggesting the creation of a JSR!

I think a common interface would be interesting, but the requirements of the interface for all needs would prevent it from being very usable - witness the issues with Serializable, which other APIs would address in entirely different ways.

That would be entirely acceptable - but the JSR (or common interface, if you think a JSR is too overblown) would have to compensate for it somehow, which would lessen its appropriateness for use.

Posted by Joseph Ottinger on janar 10, 2005 at 08:23 PD PST #

I agree: what I'm proposing here is mainly a joint presentation to get to know each other's libraries a bit better, and make a more useful session for javaone attendees.

And then take advantage of the conf to start the discussion about wether it is useful/possible to standardize certain things between us, which could lead to a JSR, or just a published set of interfaces we all agree on.

You are right, there is some likelihood that differences in the architectures make a common interface unlikely or so reduced as to be useless. Nonetheless a discussion may inform each project's designs: Serializable is a good case in point. I'm curious to hear what the others have to say about this (for people not in dev-rome mailing list, we're having a discussion these days about wether to make our interfaces Serializable. Joseph and Lance are for it, Alejandro and I against it).

Posted by Patrick Chanezon on janar 10, 2005 at 08:47 PD PST #

I'm working on yet another RSS/Atom aggregator (called Aggrevator if you're interested) and I use Informa. I tend to think it tries to do too much and would be better off as a smaller liberal parsing library like Mark Pilgrim's but it does the job. I tried Rome for a while but I couldn't guarantee that given a particular test feed that both parsers would retrieve similar enough information to be worth considering them as pluggable choices for end-users. The last thing we need is a JSR to standardise on a particular interface when there isn't a single parser out there that fully supports all the existing syndication standards. Now I've sent in a few patches to Informa but I don't consider myself a part of the core development team. Even so I think we (as in the aggregator writers and parsers users) would all be better served by all 3 groups banding together to build a single good parser (ideally a liberal one). At the very least could we work on setting up a shared set of test feeds? If we can get that working then we may be in a position to start making ambitious plans abouts JSRs. Thinking about JSRs when Atom hasn't even stabilised yet is likely to do nothing more than divert energy from providing useful functionality to people today in favour of high-flown dreams of tomorrow.

Posted by ade on janar 10, 2005 at 10:54 PD PST #

I'm working on yet another RSS/Atom aggregator (called Aggrevator if you're interested) and I use Informa. I tend to think it tries to do too much and would be better off as a smaller liberal parsing library like Mark Pilgrim's but it does the job. I tried Rome for a while but I couldn't guarantee that given a particular test feed that both parsers would retrieve similar enough information to be worth considering them as pluggable choices for end-users.

The last thing we need is a JSR to standardise on a particular interface when there isn't a single parser out there that fully supports all the existing syndication standards. Now I've sent in a few patches to Informa but I don't consider myself a part of the core development team. Even so I think we (as in the aggregator writers and parsers users) would all be better served by all 3 groups banding together to build a single good parser (ideally a liberal one).

At the very least could we work on setting up a shared set of test feeds? If we can get that working then we may be in a position to start making ambitious plans abouts JSRs. Thinking about JSRs when Atom hasn't even stabilised yet is likely to do nothing more than divert energy from providing useful functionality to people today in favour of high-flown dreams of tomorrow.

Posted by ade on janar 10, 2005 at 10:55 PD PST #

I really admire your enthusiasm Patrick. Are all the Sun engineers like you?

Posted by Bo on janar 10, 2005 at 11:14 PD PST #

Hi Patrick. I work at Rojo and spend alot of my time working on the Jakarta Feed Parser. It seems that APIs for processing RSS feeds are falling into two camps, SAX or event-based APIs and DOM or tree/object based APIs. They both have their strengths and weaknesses. SAX APIs are much more scalable for writing seems that need to work with millions of RSS feeds, such as Rojo or Bloglines; they are much more difficult to work with, though. DOM APIs are usually much easier to deal with, since you get back domain objects such as a Feed or Weblog object, but they do not scale for applications that work with millions of feeds. We can either accept that there will always be a division between these two kinds of APIs, or perhaps someone knows of a kind of API that has the ease of use of a DOM like API with the scalability of a SAX like API. Does anyone have any ideas?

Posted by Brad Neuberg on janar 10, 2005 at 11:39 PD PST #

Actually, one idea would be to have a Jakarta Feed Parser like API as an inner core, which would be event-based. Then, a Rome like API could wrap this event parser to get callbacks, which it would use to build domain objects. Programmers could then choose whether to use the inner event-based API for scalability or the outer object/tree based API for ease-of-use. By the way, the Jakarta Feed Parser handles other things other than just reading RSS feeds: * Provides a framework for reading FOAF and OPML * Has advanced feed discovery, using a variety of heuristics to discover the list of feeds, FOAF, OPML for a given URI. * Has the beginning of a universal API for writing to blogs and not just reading (i.e. hide the different blog post APIs behind a simple interface; behind it could be the Blogger API, the Atom API, MetaWeblog API, etc.) If we were to have a universal feed API these might either be included in that or broken out as seperate concerns.

Posted by Brad Neuberg on janar 10, 2005 at 11:45 PD PST #

Finally had a chance to blog about this: http://www.peerfear.org/rss/permalink/2005/01/10/JakartaFeedParserAndRome/ Few notes... If any of you guys want an invite to Rojo let me know.. I think it makes sense to create a java-rss list somewhere to continue this thread. Thoughts? Maybe just a yahoo group or something to get going. Then we can figure out the best way to share code, unit tests, APIs, etc.

Posted by Kevin Burton on janar 10, 2005 at 02:25 MD PST #

Hm... Trying to make this an anchor so I get the pagerank ;) Jakarta FeedParser and Rome

Posted by Kevin Burton on janar 10, 2005 at 02:26 MD PST #

Post a Comment:

Comments are closed for this entry.

Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.