If you've read the technology section of the newspaper recently, been to any major research conferences, or looked at press releases from the past few months, you may be thinking that the entire field of Computing has suddenly decided to study Meteorology. This is understandable, given the huge interest in Cloud Computing. Cloud Computing is "hot", and understandably so. We're finally beginning to see real adoption of rent-by-the-unit computation and storage, the democratization of software distribution, and the need for scale and elastic resource utilization by software developers.
There is obviously a scramble in the field for defining the right levels of abstraction for end users, application developers, and network operators. Speaking about cloud computing, Greg Papadopoulos stressed the need for interoperability on his blog:
We should tear out a page from the internet playbook and work towards and open set of interoperable standards and all contribute to a software commons of open implementations. Particularly important are standards for virtual machine representation (e.g. OVF), data in/exgest (e.g. WebDAV), code ingest and provisioning (e.g. Eucalyptus), distributed/parallel data access (pNFS, MogileFS, HDFS), orchestration and messaging (OpenESB, ActiveMQ) accounting, and identity/security ( SAML2, OpenID, OpenSSO).
I would add to this the need for an open set of interoperable standards for tracing and monitoring. The ability to build new applications in the Cloud by dynamically invoking other local and remote services with interoperable protocols is very powerful. However, that same dynamism, coupled with the multiple layers of abstraction that characterize datacenters, leads to systems that are very difficult to reason about in terms of performance and reliability. After all, if your new Cloud application isn't scaling linearly, or if you're seeing a "long tail" of users that see bad performance, where in the myriad of subsystems, virtualization layers, and distributed storage systems does the problem reside? Worse, does the problem emerge from the combination of underlying components, rather than any single malfunctioning piece?
Just as open standards and protocols make the Internet possible, open standards and protocols will make the "Inter-cloud" possible as well. Adding introspection capabilities and tracing support to those standards will ensure that we will be able to reason about the resulting systems. This in turn will lead to more reliable and dependable software.