Collected thoughts and musings George's Sun Blog

Monday Apr 13, 2009

If you've read the technology section of the newspaper recently, been to any major research conferences, or looked at press releases from the past few months, you may be thinking that the entire field of Computing has suddenly decided to study Meteorology.  This is understandable, given the huge interest in Cloud Computing.  Cloud Computing is "hot", and understandably so.  We're finally beginning to see real adoption of rent-by-the-unit computation and storage, the democratization of software distribution, and the need for scale and elastic resource utilization by software developers.

There is obviously a scramble in the field for defining the right levels of abstraction for end users, application developers, and network operators.  Speaking about cloud computing, Greg Papadopoulos stressed the need for interoperability on his blog:

We should tear out a page from the internet playbook and work towards and open set of interoperable standards and all contribute to a software commons of open implementations.  Particularly important are standards for virtual machine representation (e.g. OVF), data in/exgest (e.g.  WebDAV), code ingest and provisioning (e.g. Eucalyptus), distributed/parallel data access (pNFS, MogileFS, HDFS), orchestration and messaging (OpenESB, ActiveMQ)  accounting, and identity/security ( SAML2, OpenID, OpenSSO).

I would add to this the need for an open set of interoperable standards for tracing and monitoring.  The ability to build new applications in the Cloud by dynamically invoking other local and remote services with interoperable protocols is very powerful.  However, that same dynamism, coupled with the multiple layers of abstraction that characterize datacenters, leads to systems that are very difficult to reason about in terms of performance and reliability.  After all, if your new Cloud application isn't scaling linearly, or if you're seeing a "long tail" of users that see bad performance, where in the myriad of subsystems, virtualization layers, and distributed storage systems does the problem reside?  Worse, does the problem emerge from the combination of underlying components, rather than any single malfunctioning piece?

Just as open standards and protocols make the Internet possible, open standards and protocols will make the "Inter-cloud" possible as well.  Adding introspection capabilities and tracing support to those standards will ensure that we will be able to reason about the resulting systems.  This in turn will lead to more reliable and dependable software.

Monday Oct 27, 2008

I'm very excited to be attending OSDI 2008 this year in San Diego, CA (Dec 8-10).

There are some really intersting papers on tap--hope to see you there!

I'm going to OSDI '08

 I'm also going to attend the First USENIX Workshop on the Analysis of System Logs (WASL '08) the day before, on Sunday.  There are a huge number of surprisingly interesting issues that emerge related to generating, storing, analyzing, and processing log files (not the least of which is that you can generate terabytes per day without even trying.  Now where did that needle go in this haystack again?)

Tuesday Sep 02, 2008

Announcing the availability of an open-source "Live CD" aimed at providing new users to Hadoop with a fully functional, pre-configured Hadoop cluster that is easy to start up and use and lets people get a quick look at what Hadoop offers in terms of power and ease of use.  By lowering the barrier to getting Hadoop up and running, more people can try it out and explore its features.

The CD image provided gives users an environment emulating a fully distributed, three-node virtual Hadoop cluster.  One of the reasons we used OpenSolaris is its ability to emulate a multinode cluster environment in a very small memory foot print.  A three-node Map/Reduce cluster can be brought up on a machine with as little as 800 MB of memory.  Each additional virtual cluster node only requires about 40 MB of additional memory, in addition to the memory used by Hadoop.  This means that people can take Hadoop for a spin, even on their laptop.

The CD is also "live", meaning that it does not modify the contents of the user's computer. This makes it ideal for those wishing to try out Hadoop without having to install any software.  For example, students wishing to use Hadoop in a classroom lab environment can work entirely off of the CD.

Included in this release is Hadoop 0.17.1 running on OpenSolaris.  You can join the OpenSolaris Hadoop community online, as well as download the CD image, documentation, and other resources from http://opensolaris.org/os/project/livehadoop/  If you have any requests or suggestions for improvements to this distribution of Hadoop, please let us know through the community site, or join the discussion by sending an email to edu-discuss-subscribe@opensolaris.org