Wednesday December, 21 2005

Wrapping up Roller 2.1    \\ roller

Well, the Roller 2.1 release has happened rather quickly and we are now just wrapping things up and preparing for release.

You can see what Roller 2.1 has to offer on the Roller 2.1 proposal page.

The new caching code was my big contribution to this release and so far I'm pretty happy with it. blogs.sun.com has grown very rapidly over the past few months, so naturally performance quickly becomes a big concern. Probably the nicest thing about the new caching code is that it's easier to plug in new caching implementations and it's also a little easier to separate content into various caches now.

For Roller 2.1 I added a non-expiring cache implementation, which is ideal for things like xml feeds which don't change often. I've also separated some of the content into more caches, namely taking the old page cache and splitting it into a main page cache and a weblog page cache. This is helpful because under the old implementation the main page can get pushed out of the cache if your cache isn't big enough, and that sucks because the main page is probably the most time consuming page to render. We've also pushed all the planet content into it's own cache, which guarantees that the planet pages get some caching priority as well.

There is still some work left to be done with the weblog page cache, which caches all the page content for each individual weblog. Unfortunately, on a large site like blogs.sun.com with 1800 bloggers, the number of possible unique pages is very high. Considering 1800 bloggers with 100 entries each, our possible page count would be well over 180,000 pages, which is way more than a simple webapp can cache itself. So it's time to start looking into how we could make that happen.

Posted by gconf at Dec 21 2005, 11:09:14 AM PST | Permalink | Comments (2)
Comments:

I wrote most of the caching code for aceshardware.com - including for the forums (see link), which have full threading. Because of all the caching, the site runs fine on a tiny little 500MHz UltraSPARC IIe, including the database. Even the "Slashdot Effect" barely affects it. So I thought I'd share some ideas...

The basic idea is to cache the data, not the HTML. The main reason for this is that the HTML content is fully dynamic - different users can have completely different preferences. This is also memory efficient - you only need to cache data once. On the other hand it is more complex. But it also means you don't have to worry about clearing caches when data is updated - the cached data is directly updated in memory. On start-up, the code caches the meta-data for the last 30 days, and the full data for the last day - so performance is pretty good even on start-up. In addition, the caching code attempts to track how much memory the cached data takes up, and keep it to within particular limits. There's a background thread which looks for rarely accessed cached data and clears the cache for it, when the memory usage starts to get close to the limits.

PS I have one little suggestion. For the dates shown on blog postings, it might be a good idea to say how recent the post is (eg 24 minutes ago), rather than the absolute time. Since people access the site from all over the world, they have to figure out their relative time zone to make sense of the dates. I do this for the forums - though only for posts in the last 24 hours. Users can select the time-zone and format to display the dates in as well.

Posted by Chris Rijk on December 22, 2005 at 02:00 AM PST #

Chris, I've posted a follow up to your caching ideas in a new entry titled caching the web. Lets continue the discussion there.

Posted by Allen on December 22, 2005 at 10:44 AM PST #

Post a Comment:
Comments are closed for this entry.

Search

The Grabbag

Powered by
Roller Version 4.0.1.1 (BSC)
© copyright gconf ... don't copy me!