Hello from New York. That’s right. I
am attending the Cloud Computing Conference and Expo in NYC this week. I am
looking to have a one on one with Amazon's CTO Dr. Vogels, IBM's Cloud CTO
Kloeckner and of course, Sun's Cloud SVP Dave Douglas. I will hang out at Sun's
Cloud Booth and have conversations with all these and other Cloud experts
regarding the future of enterprise storage with Cloud Computing. I will blog
about the entire event for the next three days.......!
Day 1
Amazon CTO Dr. Vogels:
Main Points of the presentation:
- Infrastructure as a service –
Already a Reality
- Why Amazon needed to develop these
services for them?
He went on to discuss a few
companies using Amazon. An example:
- Animoto is a startup which creates videos.
But owns no infrastructure. They use 100% of Amazon's services. They use EC2, S3
and all other Amazon's services.
What is Amazon’s Storage as a
Service:
Key Value Storage Service: UPLOAD
=> ANALYSIS => RENDERING => DISTRIBUTION
All of this is done with Amazon SQS
to EC2 to S3 and back. Maintaining the SLA is the key objective of the infrastructure.
Scaling is required, sometimes overnight. Close to 500K developers using Amazon
Cloud. Infrastructure becomes a variable
cost as oppose to capitol expense. 600TB of pictures by Smug Mug are stored on Amazon
S3. "Cloud Computing: Style of computing where massively scalable
IT-related capabilities are provided as a service across the internet to
multiple external customers" Servers should be available ON-Demand and Pay
as you Go. Eli Lilly HPC was moved to Amazon WS. Provisioning and retiring
becomes easy, a minute at times.
Then he went on to discuss the
history of Amazon’s cloud:
Amazon went from App Server & Database
Architecture => Service Orientation => Massively
Scalable Services
Amazon virtualized three different
pieces:
- Compute => Amazon EC2,
deployment of servers and infrastructure and retirement within minutes
- Messaging => Amazon's SQS (
simple queue service )
- Storage => Amazon S3, Simple
DB, EBS (Extensible elastic block storage, hard disk in the sky, a virtual hard
disk between 1GB and 1TB which can be mounted to EC2, highly available and
replicated)
Infrastructure Services: Scalable (increase
or decrease in minutes), Cost-Effective, Reliable, Secure
Hey, he actually mentioned
OpenSolaris and MySQL as part of supported OS's and on Amazon Cloud. Coool.
Really, none of the discussion was ground
breaking. Everyone out in the industry struggles with these issues day in and
day out. Amazon seems to have understood this before hand and actually created
the services usable to everyone out there on the WWW.
Since I was interested particularly
in their S3 offering (Simple Storage as a Service), I went and looked up
whatever
details I could find on their storage infrastructure for S3.
And here it is:
http://aws.amazon.com/s3/#requirements
Two points I notice in the S3 requirement
they have put in place for themselves on this website:
- Inexpensive: Amazon S3 is built from
inexpensive commodity hardware components. As a result, frequent node failure
is the norm and must not affect the overall system. It must be
hardware-agnostic, so that savings can be captured as Amazon continues to drive
down
infrastructure costs.
- Simple: Building highly scalable,
reliable, fast, and inexpensive storage is difficult. Doing so in a way that
makes it easy to use for any application anywhere is more difficult. Amazon S3
must do both.
Building storage from inexpensive
commodity hardware is just that I guess, inexpensive. They do understand that
they have to ensure complete redundancy and availability. I wonder just how
much of this commodity hardware they had to use to create the S3 offering. How much
time was spent on creating the application which manages, administers and
controls the storage? I guess the real question is, is this model a good idea
OR even replicatable for companies looking to build internal clouds? OR, in
reality is it scalable enough as cloud computing becomes bigger and bigger? Is
commodity storage sustainable? I am currently not sure. I am sure as the days
pass and I get a chance to actually talk to some of these folks, I will have a much
better understanding.
NEXT UP - IBM:
So the talk was primarily high
level, general discussion on why virtualization was key to the future of IT. How
to attack management cost, human cost etc etc. Investment in Solid State Disk
and all the innovations going on in the IT world and how to keep IT efficient
is going to continue to get difficult. I agree with the gentleman that the
number of compute appetite, more apps, and more data is not going to subside
but actually increase. And of course, STORAGE is going to be front and center
to all of this. Why? Because everyone is going to need somewhere to storage this increasing amount of data, so the DEMAND of storage is going to continue
to increase exponentially, in my opinion, much much faster than demand for
compute. The key I need to understand again is, how does enterprise storage
fits into this, or may be, I come up with that idea and differentiator myself as opposed to waiting. Let’s see, once I
talk to these folks during the next couple of days, I will have a better
understanding of this all and will be in a better position to make a sound,
technical judgment on the future of Enterprise storage in the Cloud.
Day 2
Well, Day 1 was a little uneventful quite
honestly. I didn't get a chance to pin down some of the speakers to talk to them about Enterprise Storage etc. I will have a chance to
have one-on-one with them today, Day 2, at the EXPO floor (Hopefully). I
attended several other sessions but, to my surprise, most were marketing speeches
on one or the other product. I was disappointed with EMC's sessions which
seemed like it was going to discuss how to manage and design Clouds and storage
for clouds but was primarily a talk about one of their virtualization
management tool.
Anyways, I am spending all day today
at the Cloud Boot Camp and three other STORAGE specific sessions for cloud and
a couple of hours at the expo floor talking to folks.
The Cloud Computing Boot Camp
I have to say, for a conference,
this was the best sessions I have ever attended. Completely vendor neutral, and
full of deep technical details, without the spin, of what the CLOUD really is.
So what did I learn at the daylong
Boot Camp?
He described a CLOUD as having 6 layers:
1. The Infrastructure Layer
2. The Storage Layer
3. Platform Layer
4. Application Layer
5. Services Layer
6. Client Layer
The focus was primarily on the Infrastructure
and Storage which made me happy since I wanted to get more information on how
storage fits into the cloud
The main point driven here is that Cloud
is not for everyone. I completely agree. Especially for those who want to use
the Cloud for Storage such as Amazon S3 or any provider's Storage as a Service.
There are many providers in the market today but three are the main players.
Nirvanix, Mosso and Amazon. All supposedly use commodity components to put
together their
public clouds. The key point I took from Day 2 is that one would not put just
anything on the cloud. And one would definitely not want to put everything on
one cloud, say several TBs of data as that would really LOCK one out because it
would be a huge logistic undertaking to move that data over to a new provider
especially if the bandwidth is as slow as seen.
Is
Cloud Storage or Cloud overall for everyone? Probably not. For transactional type databases or I/O,
cloud is a very bad idea. For those who want to run 24/7/265 operations, it
could actually be more expensive to run the operation on the cloud but that is
relative as some offer as low as 10 cents per hour (Or was it a month???) for
compute with a nice size disk drive (not cloud storage mind you). Again, still
not high performance, high capacity, high I/O capable, especially over the
cloud (S3 that is).
Do
clouds experience outages? Do clouds lose data? Of course they do. Go to some of the forums and you will
read horror stories all over the place. We all remember what happened to
Amazon's S3 back in the summer of last year. Why is that? Does that have
something to do with "Commodity" components for disks? I can't really
say. Mind you, some of the issues I raise here are in some cases extreme but we
all need to plan for the worse and hope for the best, especially when it comes
to our data. There is something to be said about thoroughly test code and cross
platform interoperability testing performed by industry leading engineers. But that
is just my view. That is one reason why Sun's Open strategy works very well
because it’s not only the few who get to test our code, it’s the entire
community. The community gets to write device drivers and everyone else gets to
test it and comment on it and even make it better.
So
what about "INTERNAL CLOUDS"?
Well, my personal feeling is, medium and large organizations who already have a
well establish datacenter practices and processes will probably jump on the CLOUD
concept but more Private than public. New startups and very small business will
probably jump on the public cloud to save upfront cost and starting and running
the business initially and eventually once their business grows and all of a sudden
they need scalability beyond just basic web front end and so on, I think they
will look into a traditional, private datacenter or co-loc private cloud. WHY?
Because public clouds are "SHARED" without much in terms of resource
management among different virtual machines. Most guarantees (SLAs) are
primarily geared towards Uptime and nothing much else. If someone wants to
actually be guaranteed specific CPU, Memory and I/O at ALL times, they are
going to have to start paying higher dollars as opposed to an Amazon EC2 type
cloud. This is where I think virtualization technologies which have specific distributed
resource management technologies built in come into play: VMware VI3 and soon
to be released VI4 with very cool new features which I can't really comment on
right now due to NDA. Sun xVM | Server of course and others such as MS Hyper-V.
All offer resource management tools such as VMware DRS by which pools of CPUs and
memory resources along with I/O can be created and managed auto magically by
the application itself. One could specify how much compute and I/O they want
guaranteed and the resource manager tool can set policies which then go ahead
and manage these resource needs for the particular virtual machine. A win-win situation for the Cloud user and
the Cloud provider. But this level of control, today, is possible primarily in
an internal Cloud with all the cool things which go along with virtualization such as dynamic disaster recovery. Obviously,
Storage is at the middle of it all. SHARED storage allows the CLOUD to even
Exist. Even if it is commodity storage, it still has to be shared somehow. But
for medium to large companies, a private cloud becomes more viable because they
can move existing, enterprise class storage resources (such as the Sun Storage
6000 series or the Sun Storage 9900 series), take advantage of built-in technologies
such as thin-provisioning (Think provision ONLY what is used and manage the
scalability as needed), Remote Replication such as RVM for Sun Storage 6000
series and UVR / TrueCopy for Sun Storage 9000 series for Disaster recovery
with integrated tools such as VMware Site Recovery Manager. ZFS can also be
used to spread the data across several arrays and several types of arrays.
Later on in the day, I had a
one-on-one conversation with EMC's Senior Product Manager for Cloud Infrastructure
Group (EMC ATMOS). I had made a point of looking at EMC's ATMOS product a
couple of days ahead of the conference to understand what EMC's cloud storage
offering was. We had a very good discussion on the future of block storage
(specifically enterprise storage) in terms of Cloud Computing. I suggested that
enterprise storage is here to stay for a while until Cloud S3 matures enough to
provide the bandwidth, the availability and reliability required for high
performance, high bandwidth and high availability applications such as Oracle OLTP, SAP etc. This is obviously my personal view and nothing to
do with Sun's view on this.
I attended a session by IBM's Chief Technical
Strategist (sounds like my counterpart position at IBM but at a higher level).
Very knowledgeable gentleman in terms of storage and the future of storage I
think. This was the second best sessions (in my opinion) I attended after the
Cloud computing Boot Camp. He actually pointed out and confirmed several of my
personal thoughts on how storage fits into the Cloud story. Commodity
Components VS Homogeneous components. Not surprisingly, IBM's private cloud
storage offering currently includes IBM 3200 and 4700 storage arrays along with
others. More reliable, industry tested scalable storage arrays with built-in DR and thin provisioning capabilities are ideal for Internal
Storage and Compute Clouds built on virtualization applications such as Sun xVM
Server, VMware and others.
My Conclusions on the Future of Block
Storage in terms of Cloud:
Block storage is extremely good at block
access response times and transaction processing and RAS (reliability,
availability and serviceability). I do not see enterprise storage going away anytime
soon just because we can store TBs of data in the cloud. What matters is what
we can and cannot do with the data. Enterprise block storage is here to stay
and will continue to do what it does best. But why the push towards
"Commodity Storage Components" (many many blade servers with large commodity
disks)? Well, COST of course. Huge clouds such as Amazon S3 and others sit on commodity storage for a reason. They do not have to worry
about offering high bandwidth, high I/O to the users because the users seem to
understand what they can and can’t do with Cloud Storage as a Service offerings
and they use it as such: Archiving, long term retention and other similar types
of usage.
P.S. The views in ALL of my blogs
are mine and not in any way of Sun Microsystems Inc.