Monday Nov 19, 2007
Monday Nov 19, 2007
I had a interview with a journalist last week about automated tiered storage, here is the transcript. Interesting questions that he asked, which makes me think how people think about tiered storage.
Q: For what is automated tiered storage good?
A: Even there are many problems that can be solved by automated tiered storage. There are three main areas where tiered storage is good, saving money or running costs, reducing the storage management workload and legal reasons. Briefly, from a power, cooling and reliability perspective it is not economical to store archive or long term data on one tier, primary storage such as disk. So by moving inactive data from disk to lower cost tier/media you can save money. To reduce the storage management workload, by implementing tiered storage you start to manage your data more efficiently. If you can have your data stored on the appropriate cost and performance storage you will not run out of space as often as if you just have all data on one tier. From a legal perspective you also may need to store data for very long periods of time. In the case of legal or medical records you just cannot keep it on spinning disk as this requires power all the time, also the lifespan of a disk is about 5yrs, CD/DVD is about 10yrs and tape about 15yrs. So you can imagine that if you only have one tier such as disk and you have a large amount of data your staff will be doing nothing productive, but just migrating data and replacing disks, CD's or DVD's. Migrating and replacing disks is not a business that a company wants to get into as it does not generate any revenue, just costs money. To summarise if you have automated tiered storage your staff can spend more time on much more positive and revenue positive projects. You still have to migrate your data from old to new disks, replace them, but it may not be such a difficult task.
Q: For whom is it a good approach?
A: There is no such thing as a normal or standard computer environment, company or business model, so it is a difficult question. But really it is good for people or organisations that want to save money or have large amounts of data or cannot control their data. So it is good for established businesses such as Banks and it is also very appropriate for the new Internet companies which have lots of data that they have to keep but becomes inactive, that is data that they may need to read one day but they cannot delete it. For many companies nowadays data is the "lifeblood" of the company, the actual data has value. For example if new data mining techniques become available in the future companies may want to keep old marketing data or customer transaction history and run it through various data warehouses. So it is good for whoever wants to extract the maximum value from their data. You never know the value of something until you do not have it.
Q: For whom is it not?
A: In reality no-one. I would say that it is not good for people who have lots of money and do not understand computer systems, possibly home, small and medium business, but if they want to save costs and reduce complexity, then it is good. So many people at home copy old photos to CD or DVD. Even though that is not automated, people are still using a tiered storage approach. Home and SMB users often buy lower cost and slower data storage devices to attach to their PC's. They then move old data such as photos to these devices.
Q: What should companies consider before starting an tiered (automated) storage project? (How to get the most out of tiered storage)
A: Companies should understand their data and how it relates to their applications. They should consider that some data increases in value over time such as medical data, legal and insurance data. Other types of data decrease with time, temporary files & SMS have lifespans of minutes and outputs of test program runs you may keep for hours or days until you get the correct result or report. Sun has a very simple questionaire that all companies can use to see how mature they are in managing their data compared to others, this is called the Information Maturity Model. This has all been documented here: http://www.sun.com/software/solaris/optimization_report.pdf
Q: What are some of the pitfalls?
A: The greatest pitfall is when you do not understand who uses the data and how. Then just move it to another tier. This is what we call the access pattern, there is not point in moving data from primary storage e.g. disk to secondary storage e.g tape every night and then every morning people want to use it. You are just wasting computing resources moving the data between tiers. It is what we call thrashing. Not understanding your data usage and access patterns is the greatest pitfall.
Q: Is automated tiered storage The Future? Or are there other technologies/approaches that might be better?
A: Automated and tiered storage has been available for at least 20yrs, it was called HSM in the early days, we had solid state disk, spinning disk and tape then. However, automated tiered storage often tracks and goes up and down with economic cycles. This includes the recent increase or peak interest in eco-computing. Where eco means economic and ecological, these factors go hand in hand and are strongly interrelated. So while we are concerned with power and cooling and general eco-computing it remains important. If a new storage medium becomes available then the dynamics of the tiers and costs can change or stop the trend to automated tiered storage. However, as people will always be creating more data and there will always be different cost and performance characteristics of storage tiers, devices and media I believe that automated tiered storage is the future. Another approach that may be better is to delete the data you do not need, how many people or organisations do this ?
Thanks for now.