Paul's Blog 2.0 (Beta)

Sun Storage 7000 as an Administrator Development Platform

Tuesday Aug 04, 2009

The Sun Storage 7000 Family of Appliances breaks ground in manageability and transparency through an amazing amount of analytics information provided to administrators as well as a highly customizable and extensible management environment that resides on the system. The "Workflow", delivered in the latest release of appliance software, is of particular interest to those of us responsible for "integrating" the Sun Storage 7000 into a management ecosystem, bundling pieces of management logic for use by our peers and reproducing management logic (such as configuration and environmental setup) on several systems at a time.

A workflow is a parameterized piece of logic that is uploaded to a Sun Storage 7000 where it remains resident and is then run via the BUI, CLI or remotely via a shell. The logic within the workflow is programmed in JavaScript (resident on the Sun Storage 7000) and interacts with the system's management shell via "run" commands or built-ins that interact with the current administrative context.

A workflow can do anything that an administrator could do via the CLI, but in a nicely bundled and parameterized way. Here are a few things I've done with workflows:


  • gather information about the appliance and reformat it to make it digestable by a higher-level tool
  • retrieve sets of analytics data and turn them into different sized chunks (instead of 1 second interval give me a 60 second interval as an average as well as the min and max during the interval) and reformat it to make it easy to digest
  • manage the lifecycle of shares (create, manage settings and delete) that are common across appliances
  • manage network settings
  • create a set of worksheets on every appliance in the network

The opportunities for automation are endless, only bounded by the needs of the administrator in their efforts to integrate the appliances within the management ecosystem.

There is substantial documentation on the appliance's Help Wiki, but for clarity, here is a very simple workflow that will list the attribute of a filesystem that is given as input to the workflow:


  • Input: attribute name (same as the attribute in the CLI)
  • Output: CSV format: project,sharename,attribute (one line for each share)
  • Behavior Notes: a listed attributed that is not valid will return NA in the column (this could be moved to parameter verification but will serve to illustrate exception handling). Also, there are some properties that return empty values as the value was actually inherited from the project context.

Since this is a relatively "short" example, I will simply put the code here with comments and then add additional information afterwords. Note the use of JavaScript functions (such as printToString) as well as the most important element, the definition of the variable "workflow".

/* The printed headers, one will be added with the property name */
var headerList = new Array(
"Project",
"Share"
);

/* A function to print the array into a string for display */
function printToString(csvToPrint){
var csvAsString = "";
for(var i=0 ; i csvAsString = csvAsString + csvToPrint[i];
// do not finish with an end of line marker
if(i!=csvToPrint.length-1) csvAsString = csvAsString + "\n";
}
return csvAsString;
}

/* This is a required structure for the workflow, it identifies the name, parameters
and the function to execute when it is run */
var workflow = {
name: 'Get Filesystem Attribute',
origin: 'Sun Microsystems, Inc.',
description: 'Prints a Property for all Shares',
parameters: {
property : {
label: 'Filesystem Property',
type: 'String'
}
},
execute:
function (params) {
// prepare the output arrays
var csvContents = new Array();
var currentRow = 0;
headerList[2] = params.property;
csvContents[0] = headerList;
currentRow++;

// go to the root context to start navigation
run('cd /');
run('shares')

// get a list of all of the projects on the system
var projects = list();

// navigate through each project
for(var i=0 ; i run('select '+projects[i]);

// get a list of all shares
var shares = list();

// go into the context of each share
for(var j=0 ; j run('select '+shares[j]);
var filesystem = true;
var mountPoint = "";
try {
mountPoint = get('mountpoint');
} catch (err) {
// will end up here if "mountpoint" does not exist, not a filesystem
filesystem = false;
}
if(filesystem) {
var currentRowContents = new Array();
currentRowContents[0] = projects[i];
currentRowContents[1] = shares[j];
try {
var propertyValue = get(params.property);
currentRowContents[2] = ""+propertyValue;
} catch (err) {
currentRowContents[2] = "NA";
}
csvContents[currentRow] = currentRowContents;
currentRow++;
}
run('cd ..');
}

run('cd ..');
}

var newCsvAsString = printToString(csvContents);

return (newCsvAsString);
}
};

While the bulk of the example is standard JavaScript, the workflow structure is where there must be adherence. Here are the important properties:


  • name - The name that the workflow will be identified by within the BUI or CLI
  • origin - The author of the workflow, can also be used to minimize name collisions
  • description - A description of the contents of the workflow, displayed in the BUI or CLI
  • parameters - A list of parameters with types (the types supported are listed in the documentation)
  • execute - The function that gets executed when the workflow is run (there are more advanced ways of identifying the execution code than are shown here)

The code itself interacts with the system to get a list of the projects on the system, then a list of the shares within the system. The mountpoint property is ONLY available on filesystems, so we know if there is a property error that we do not have a filesystem and skip processing of it (it is most likely an iSCSI LUN).

To upload the workflow, cut/paste the text above and put it in a file. Log into a Sun Storage 7000 Appliance with the latest software and go to Maintenance / Workflows. Click the "+" sign to add a workflow and identify the location of the file. The syntax is error checked on upload, then you will see it listed. Workflows can also be uploaded from the CLI.

Here is what a run of the workflow from the CLI looks like:


isv-7110h:maintenance workflows> ls
Properties:
showhidden = false

Workflows:

WORKFLOW NAME OWNER SETID ORIGIN
workflow-004 Get Filesystem Attribute root false Sun Microsystems, Inc.

isv-7110h:maintenance workflows> select workflow-004
isv-7110h:maintenance workflow-004> ls
Properties:
name = Get Filesystem Attribute
description = Prints a Property for all Shares
owner = root
origin = Sun Microsystems, Inc.
setid = false

isv-7110h:maintenance workflow-004> execute
isv-7110h:maintenance workflow-004 execute (uncommitted)> ls
Properties:
property = (unset)

isv-7110h:maintenance workflow-004 execute (uncommitted)> set property=space_total
property = space_total
isv-7110h:maintenance workflow-004 execute (uncommitted)> commit

Project,Share,space_total
AnotherProject,NoCacheFileSystem,53928
AnotherProject,simpleFilesystem,53928
OracleWork,simpleExport,53928
TestVarious,filesystem1,53928
default,test,448116
default,test2,5368709120
isv-7110h:maintenance workflow-004>

While the example is simple, hopefully it illustrates that this is the start of workflow capabilities, not the entirety of them. The workflow can create management structures (like new shares and worksheets), delete them, modify them, and even enable and disable services.

Workflows make the Sun Storage 7000 an Administrator Development Platform. Try it out in the Sun Unified Storage Simulator if you don't have an appliance at your fingertips!

Like this post? del.icio.us | furl | slashdot | technorati | digg

National Archives, PASIG, a little Vacation

Thursday Mar 27, 2008

My family and I took a brief vacation this weekend and made our way to Washington D.C. for a little R & R. We enjoyed 2 and a half days of sights, tours, history and we even squeezed in a little time for the pool. For those of you that have been to D.C. (or live there), you know that 2 and a half days only allowed us to scratch the surface of the United States cultural base that is alive as well as preserved in the city (and often within a few blocks of the National Mall).

There are so many thought provoking and emotional moments as you move around that after two and a half days I found myself almost completely wrung out. We saved the Congressional Gardens with the Vietnam Memorial, World War II Memorial, Lincoln Memorial, Korean Memorial, and the others for the last day. The artistry and the thought that went into these memorials is astounding and the emotions that they pull out of you put you into knots.

I won't list everything we did on the whole journey over the weekend. For my youngest son, going up the Washington Memorial (and our need to start standing in line for tickets at 6:30am) will probably be the most impacting moments. For Shaun, hopefully the Vietnam Memorial and the Pederson House. For me, who knows, the Bill of Rights, the Constitution, the Declaration of Independence, the Magna Carta...simply amazing, but overall I can't name a single moment that wasn't worth its weight in gold.

Professionally though, the National Archives had to be one of the most thought provoking of our stops.

Here is this large building, with all of these physical manifestations of our history on display and in vaults around the building. The Constitution, the Declaration of Independence, the Bill of Rights, the Emancipation Proclamation and more than I could ever list here. It was "Magna Carta Days" at the National Archives as well and one of the four remaining copies of the 1297 Magna Carta from King Edward I was on display.

Here are a few thoughts that went through my head:


  • With infinite, perfect copies of digital content, what makes a digital entity "unique" and "awe inspiring"?
  • How is our country going to preserve and revere a digital creation 710 years from now?
  • How do you know what digital creations are worth preserving since you can hit a button and destroy them so easily?

The first of these seems entirely out of place with storage technology, but when you stand in front of the Declaration of Independence it makes you wonder what digital content could actually have this impact on a person and how you would embody that digital content. The record companies are struggling with it as well. In addition to digital download content, the companies are trying out releases on USB thumb drives as well as larger packages and "Deluxe" sets. Books are going to struggle through the same revolution (as magazines already have) of being bundled as bits with little or no "branding" or "artistry" about the packaging. How does one recreate that sense of uniqueness when the content is merely a bunch of bits that gets flattened into 10 songs amongst 8,000 on an iPod? Really, what "value" do those songs and books have anymore when they can be passed around at will and are part of a great "torrent" of traffic into and out of our computers? Something really has to "stick" to remain on "top of our stereo" these days.

And as for the Declaration of Independence...what a clean and simple document. The document itself hung in windows and is incredibly faded and worn down. Only after time passed did our country seek to formally preserve it for posterity. Perhaps we caught it in time to save it from deteriorating any more. But one still has to ask, is it the single original document that retains the significance or is it the content that remains significant. If it is the content, we wouldn't store the original in a huge underground vault and protect it as well as Vice President Cheney, would we?

Having seen the original, I would have to argue that there is something incredibly unique about it, it actually holds more reverence (for lack of a better word) than one of the many copies of it. So how does one reproduce that "reverence" in a digital world?

If that is not enough to think about, we have to think about digital preservation. The Magna Carta of 1297 has withstood time for 710 years and is in wonderful shape. What digital storage technology today do we have that can withstand decay for that length of time (of course, one could argue that some rock etchings have withstood time for 1,000s of years). Let's put this in perspective, today's disk drives and SSDs are generally spec'd for 5 years. If I want to preserve my family's pictures for 710 years, I would have to ensure the data was migrated 142 times. Hmmm, I'm not sure if my kids and their kids and their kids are up for that.

It appears that CDs and DVDs may have a lifespan of around 50-200 years if you preserve them properly. That is getting pretty reasonable...of course, they haven't been around for 50 to 200 years so they are certainly not battle tested like carving on a good rock. The National Institute of Standards and Technology appears to be looking heavily into the longevity of optical recording media. DLT appears to have a shelf-life of around 30 years if preserved properly.

Let's say, hypothetically, that you solve the problem of the storage media (perhaps a self migrating technology in a box that guarantees infinite lifespan and that, itself, produces the new disks and technology to ensure fresh DVDs are always built). Now you have two additional challenges (at least):


  • Maintaining the integrity of the data (how do I ensure) that the data that is NOW on the DVD is the original data
  • Maintaining the ability for outsiders to inspect and recall the stored information

The first of these seems obvious, but is actually quite difficult. Checksums can be overcome with time (imagine compute power in 700 years!) and we can't guarantee that the keepers of the information will not have a vested interest in changing the contents of the information. We see governments attempting to re-write history all the time, don't we?

Let's take a simpler example of what happens when "a" byte disappears. Recall Neil Armstrong's famous quote: "One small step for man ... ". Well, after a lot of CPU cycles and speculation and conspiracy theories, it turns out that we now believe that Neil Armstrong said: "One small step for a man...". It is a fundamentally different statement (though there is no less historical impact). This data is only 40 years old, but consider the angst in trying to prove whether or not "a" was a part of the quote. What happens when a government deliberately alters, say, the digital equivalent of an "original" 2nd Bill of Rights written in 2030?

One more thought for the day, since I really do have to work and if you have made it this far, it is my duty to you to free you of my ramblings.

We know for a fact that the English (dare I say...United States dialect) language is evolving. Even after 200 years there are phrases and semantics and constructs in the Declaration of Independence that require quite a bit of research for the common US citizen. Take the following paragraph:

He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty & perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation.

There is the obvious use of the word perfidy, a word that has since all but disappeared from common speech in the United States.

Looking deeper at the paragraph we see evolution in spelling (compleat). There is also a fascinating use of capitalization throughout the Declaration of Independence. The study and usage of capitalization alone could be worth the creation of long research papers.

What does this tell us? The content and meaning of a work lies often with the context and times in which the work was created. How does one retain this context, language, and ability to read the content over 700 years? This is not a small problem at all. There are entire cultures lost or in the process of being lost as the language and the context is lost, consider the United States own Anasazi culture as an example.

A computer dialect (protocol, standard, information model, etc...) are themselves subject to evolution and are even more fragile than spoken language itself. A change in a capitalization in an XML model may break the ability of pre-existing programs for reading and migrating information, resulting in lost information. Once you break a program from 200 years prior, how much expertise will still exist to maintain and fix that program?

Crazy things to think about. Personally, I believe we are in a fragile place in our history where we could lose decades of historical information as we transition between written works and digital works. As part of my night job I'm trying to get more involved in the Sun Preservation and Archiving Special Interest Group (PASIG) to learn more about what our customers are doing in this area. I'm also trying to reorganize my own home "infrastructure" to be more resilient for the long run to ensure that my family's history does not disappear with my computers.

There are significant challenges in the computer industry all over, but preservation of history is one that our children and our children's children will judge us with. USB thumb drives will come and go, but hopefully our generation's digital treasures will not go to the grave with us.

Like this post? del.icio.us | furl | slashdot | technorati | digg

Computational Photography and Storage

Monday Mar 03, 2008

There is a great article on CNet's news.com about computational photography, "Photo industry braces for another revolution". It is basically about Photography 2.0. The first wave of digital photography seeks to reproduce film-based photography as well as it can. Photography 2.0 advances hardware while taking advantage of higher processing power within the camera to take advantage of the new hardware, replace hardware functionality with software functionality or bring image detection and manipulation capabilities that are not possible in the hardware space.

There are a few developments worthy of note, and all of them involve bringing more CPU capabilities into the camera:


  • Panoramic photography - I enjoy these types of scenes (one shown below), though I don't think they are the future of photography at all
  • Depth of field and 3-D photography - There is an excellent example of this in the CNet article. Personally, depth of field is arguably one of the most difficult techniques to master since this is purely 4-dimensional using our current lenses (aperture size decrease increases time of exposure and more depth will be in focus, etc...)

There are many other ideas in the article...detecting smiles (an extension of this is closed or open eyes), better light detection, self-correcting for stabilization (this is done with high priced hardware today in Image Stabilized lenses), etc... Clearly a Photography 2.0 revolution is in the works.

Photography 2.0 is really the same trend we see in the storage business...Storage 2.0. There are simple changes in the industry, like the incredible increase in CPU driving software RAID into storage stacks again. A huge benefit with software RAID is the decrease in hardware costs that it drives. This is very similar to the Photography 2.0 concept of moving image stabilization out of the hardware (the lenses) and into the software.

Storage 2.0 also brings us projects like this one: Project Royal Jelly. Project Royal Jelly encompasses two important pieces, one is the implementation of a standard access model to fixed content information, a second is the insertion of execution code between the storage API and the spinning rust. The ability to "extend" a storage appliance (or device) via a standard API will allow us to leverage the proliferation of these inexpensive and high-powered CPUs. A common use-case for an execution environment embedded in a storage device would be an image repository or a video repository. Every image submitted goes through a series of conversions: different image formats, different image sizes (thumbnail, Small, Medium, Large), and often a series of color adjustments. Documents go through similar transformations: a PDF may have different formats created (HTML primarily), the document will be indexed, larger chunks will be extracted into a variety of metadata databases for quick views, etc...

These transformations can arguably be the responsibility of the storage operation, not the application operations, especially when the operations can be considered part of an archiving operation. While indexing and manipulation could be considered a higher tier, storage tiering and taking advantage of storage utilities could also benefit from a standard storage execution platform. Vendors could easily insert logic onto storage platforms to "move" data and evolve a storage platform in place rather than authoring applications that have to operate outside of the storage platform.

Just some Monday morning musings...have a great week.

Like this post? del.icio.us | furl | slashdot | technorati | digg

Why one bit matters.

Friday Feb 15, 2008

Sometimes I wonder why I'm in the field of storage. Its not glamorous. Its JBODs, RAID arrays, HBAs, expanders, spinning rust, and all of those things wrapped into enclosures with lots of fans humming. My background is varied, I wrote a file system for my Master's, I worked on one of the biggest Java Business Frameworks ever (the SanFrancisco Project at IBM), and I've danced between the application and infrastructure space more than once.

I often think about my "ideal" job, I've even pondered it here on my blog...and take note, the new Jack Johnson CD is very good and I am ripping it to 8-track real soon now. Personally, I love the field of digital preservation, XAM is in the right direction, and long term digital archives are important to people-kind.

But still, this storage business, there is something to it.

I watched my friend get their eyes lasered to correct their vision this week. While I was watching, I was able to sit with one of the assistants and pepper her with questions, it is an astounding process. Basically, as I understand it, the Doctors use the scanners and computers to


  • map the surface of each eye
  • analyze the surface to understand why the vision is incorrect
  • create several corrective treatments
  • the doctor looks at the corrective treatments and adds their wisdom to make the right decision (a lot goes into this, like the health of the patient, the age, their profession, whatever...)
  • the doctor may tweak the map of places that need adjustments
  • the updated map is loaded into the "laser"
  • the patient comes in, gets prepped, the doctor aims the laser and sets the program loose
  • the "laser" jumps around the eye zapping away
  • the doctor reassembles the eye
  • the patient goes home

Coolness. But then the geek in me took over, I asked what I could about the machine, backup generators, power, moving the data, mapping the eye, etc... But my head kept thinking about the storage and computer software.

What if a bit is wrong? What if the bits are stored away but due to some battery backup cache being down, it doesn't really get stored and the out of date map is actually in place? What if one tiny point "ages" and becomes rust and there is no checksumming to see it "rotted"? These are people's eyes, you know? Would you want to be the storage vendor that supplied storage that messed up someone's eye because you didn't get the signal / noise ratio on the cabling right?

I've been thinking a lot about digital photography lately as well. While its not people's eyes, it is still an incredibly fragile process. In fact, many of the world's best photographers still do not use digital, and for very good reason. Even when you purchase photographs, you pay a premium price for pictures that have not gone through the digitization process.

Think about this, if a person takes a picture, the CCD (or whatever they are these days) takes the light and transfers it to a memory card. The memory card gets transferred to a laptop hard drive (in my case), a variety of backups are made and I move many of the pictures to SmugMug.

That's a lot of storage along the way. Now, let's say (God forbid), my house burns down. I get my pictures back from SmugMug and one of my pictures has a bit that rotted away.

Now, that is one tiny bit of imperfection to some people. To a professional, that picture is no longer an original. At that point, you have to decide to toss away your artistic integrity and photoshop the point to be like the ones near to it, or just toss the picture from your portfolio. Either way, the picture is never the same.

How would you like to be the one that sold the storage unit that allowed the bit to rot or be stored incorrectly, or archived incorrectly and destroyed that person's memory, that one perfect picture that was meant to be a keepsake forever.

Well, when you think about it, building storage units and management for those storage units is probably not as glamorous as owning the software or companies that specialize in photo archiving, or "lasering" people's eyes, or storing original recordings for artists, or archives of space travel. But those folks have to pick storage units from a company...and if you are the company they pick and you fulfill your moral responsibility to supply checksumming in your file systems, and well-tested storage that may occasionally be late to market to ensure that a memory is not lost or an eye doesn't get fried...you know, that's pretty rewarding.

Cheers to all of my co-workers at Sun who believe storage is more than a spinning drive or a paycheck.

Like this post? del.icio.us | furl | slashdot | technorati | digg

Storage Remote Monitoring...got that...

Friday May 25, 2007

One of my many projects is to tackle the product-side architecture for Remote Monitoring of our storage systems. Remote Monitoring is a fascinating problem to solve for many, many reasons:


  • There are different ways to break the problem up, each being pursued with almost religious fanaticism, but each having its place depending on the customer's needs
  • It is a cross-organizational solution (at least within Sun)
  • It has a classic separation of responsibilities in its architecture
  • It solves real problems for customers and for our own company
  • It is conceptually simple, yet extremely difficult to get right

The problem at hand was to create a built-in remote monitoring solution for our midrange storage systems. Our NAS Product Family and anything being managed by our Common Array Manager was a good start. Our CAM software alone covers our Sun StorageTek 6130, 6140, 6540, 2530, and 2540 Arrays. Our high-end storage already has a level of remote monitoring and we already have a solution to do remote monitoring of "groups" of systems via a service appliance, so our solution was targeted directly at monitoring individual systems with a built in solution.

This remote monitoring solution is focused on providing you with a valuable service: "Auto Service Request", ASR. The Remote Monitoring Web Site has a great definition of ASR: Uses fault telemetry to automatically initiate a service request and begin the problem resolution process as soon as a problem occurs. This focus gives us the ability to trim down the information being sent to Sun to faults, it also gives you a particular value...it tightens up the service pipeline to get you what you need in a timely manner.

For example, if a serious fault occurs in your system (one that would typically involve Sun Services), we will have a case generated for you within a few minutes...typically less than 15.

The information flow with the "built in" Remote Monitoring is only towards Sun Microsystems (we heard you with security!). If you, the customer, want to work with us remotely to resolve the problem, a second solution known as Shared Shell is in place. With this solution, we work cooperatively with you so that you can collaborate with us to resolve problems.

Remember though, I'm an engineer, so let's get back to the problem...building Remote Monitoring.

The solution is a classic separation of concerns. Here are the major architectural components:


  • REST-XML API
  • HTTPS protocol for connectivity
  • Security (user-based and repudiation) via Authentication and Public / Private Key Pairs
  • Information Producer (the product installed at the customer site)
  • Information Consumer (the service information processor that turns events into cases)
  • Routing Infrastructure

The REST-XML API gives us a common information model that abstracts away implementation details yet gives all of the organizations involved in information production and consumption a common language. The relatively tight XML Schema also gives an easily testable output for the product without having to actually deliver telemetry in the early stages of implementation. Further, the backend can eaily mock up messages to test their implementation without a product being involved. Early in the implementation we cranked out a set of messages that were common to some of the arrays and sent them to the programmers on the back end, the teams then worked independently on their implementations. When we brought the teams back together, things went off without much of a hiccup, though we did find places where the XML Schema was too tight or too loose for one of the parties, so you do still have to talk. The format also helps us bring teams on board quickly...give them an XSD and tell them to come back later.

Here is an example of a message (real data removed...). Keep in mind there are multiple layers of security to protect this information from prying eyes. We've kept the data to a minimum, just the data we need to help us determine if a case needs to be created and what parts we probably need to ship out:


<?xml version="1.0" encoding="UTF-8"?>
<message xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="message.xsd">
<site-id>paul</site-id>
<message-uuid>uuid:xxxxx</message-uuid>
<message-time timezone="America/Denver">2005-11-22T12:10:11</message-time>
<system-id>SERIAL</system-id>
<asset-id>UNIQUENUMBER</asset-id>
<product-id>uniqueproductnumber</product-id>
<product-name>Sun StorageTek 6130</product-name>
<event>
<primary-event-information>
<message-id>STK-8000-5H</message-id>
<event-uuid>uuid:00001111</event-uuid>
<event-time timezone="America/Denver">2005-11-22T12:10:11</event-time>
<severity>Critical</severity>
<component>
<hardware-component>
<name>RAID</name>
</hardware-component>
</component>
<summary>Critical: Controller 0 write-cache is disabled</summary>
<description>Ctlr 0 Battery Pack Low Power</description>
</primary-event-information>
</event>
</message>

Use of XML gives us the ability to be very tight with use of tabs and enforce particular values, like severity, across the product lines.

The format above is heavily influenced by our Fault Management Architecture, though an FMA implementation is not required.

What we've found is that good diagnostics on a device (and FMA helps with this) yields a quick assembly of the information we need and fewer events that are not directly translated into cases. FMA and "self healing" provide and exceptional foundation for remote monitoring with a heavy reduction in "noise".

The rest of the architecture (the services that produce, consume, secure, and transport the information) is handed off to the implementors! The product figures out how to do diagnostics and output the XML via HTTPS to services at Sun Microsystems. Another team deploys services in the data center for security and registration (there are additional XML formats, authentication capabilities and POST headers for this part of the workflow). Another team deploys a service to receive the telemetry, check the signature on the telemetry for repudiation purposes, process it, filter it, and create a case.

There are additional steps that each product needs to go through, such as communicating across organizations the actual message-ids that a device can send and what should happen if that message-id is received.

In the end, the centerpiece of the architecture is the information and the language that all teams communicate with. Isn't this the case with any good architecture? Choose the interfaces and the implementations will follow.

Keep in mind, this remote monitoring solution is secure end to end. Further, remote monitoring is only one piece of the broader services portfolio...I'm just particularly excited about this since I was privileged to have worked with a great, cross-organizational team to get it done! The team included Mike Monahan (who KICKS BUTT), Wayne Seltzer, Bill Masters, Todd Sherman, Mark Vetter, Jim Kremer, Pat Ryan and many others (I hope I didn't forget any). There are also lots of folks that were pivotal in getting this done that we lost along the way (Kathy MacDougall I hope you are doing well as well as Mike Harding!).

This post has been a long time in coming! Enjoy!

Like this post? del.icio.us | furl | slashdot | technorati | digg

Wired: One Giant Screwup for Mankind

Friday Jan 12, 2007

Several weeks ago I blogged about data loss and taking the long view when it comes to data retention. This month's Wired magazine has an article entitled One Giant Screwup for Mankind that illustrates the need for taking a long view of data retention policies. It also brings up an interesting point about our current trend at digitizing and chopping our digital content up into lossy data compression formats (like 128kbps MP3s).

Apparently, the grainy images of the original moon landing that we see on TV ("one small step for [a] man...") are not the original images and sound! The engineers were forced to create a smaller format for transmission from the moon to earth, it was 320 scan lines at 10 frames per second transmitted at 500 kHz. This stream was received at 3 tracking stations, pushed to a central location, recorded on media and converted to the broadcast rate of 525 scan lines at 30 frames per second transmitted at 4.5 Mhz. This is essentially 3 transmissions (camera to tracking station, tracking station to central site, central site to tv) and 2 conversions (camera to moon/earth broadcast, moon/earth broadcast to tv). Between the reception of the data and conversion to the tv format, the quality was greatly reduced! The engineers noticed that the broadcast images were not as crisp as what SHOULD have been in the original format. In fact, they could verify this with pictures of the monitors in the conversion room. So, the engineers tried to find the tape that the original data was recorded on so they could recover the full quality images.

Gone, lost, disappeared.

Just as I mentioned previously though, the engineers had TWO problems they had to work on:
- Getting and retaining the equipment they could use to recover the origninal data (remember, we have went through multiple media formats since the 60s)
- Locating the original tapes used for recording the data stream prior to conversion to the television signal format

I won't tell you how its going, you have to read Wired to find out. But, this does bring up an excellent example of
- Why a company that has record retention requirements of over 7 years must put in place a comprehensive policy to not only record the information and store it, but also retain the equipment that can read that data and write it to a new format. Some companies, instead of storing the components to read/write data, will enact a policy to migrate the data to the current media format every 7 years or less.
- Why a company should consider the effect on history of losing their data if their retention policy is less than 7 years or not explicitly stated. For example, is there a retention policy at our record companies for all of the garage band tape recordings they've received? If there isn't, how are we going to retain this valuable piece of American History and Culture? The record companies have a historical responsibility to record and maintain these.

More interestingly to me for this blog post is the problems with the data conversion process itself. Recall I'm a big vinyl fan at this point. Vinyl and analog recordings provide a warm and continuous signal whereas digital chops that up into many slices. Further, when compressing information for our MP3s we actually lose data. Depending on the number of Kbps you use, the data loss can be very noticeable in certain types of music.

Many download services also do not provide lossless downloads.

In the coming year we will see 1 Terrabyte desktop drives. I am convinced that we will start seeing more pervasive use of lossless compression. Still, it begs the question, will our original data remain intact? Are we losing important historical data and content quality through the conversion to digital and then using lossy compression techniques because we feel the quality is "good enough"? I have every reason to believe that as we start merging technology with our bodies and brains, our senses will become more and more aware of the lossy compression techniques used in the late 90's and early 2000's. Even without computer enhancement our brains are adapting to the saturation of media and information in a way that previous generations would be astounded at.

The only question to our kids who will have the heightened senses through the merging of technology with our human anatomy will be "How much quality did my parents compromise and lose for the sake of their convenience...and how much of it will we be able to recover to enjoy their creativity to its fullest potential?". So, be sure to save those original recordings...especially if you are the owner of the Beatles recordings.

btw, does anyone REALLY agree with releasing a Beatles album that does not adhere to the group's original music scores but is instead a mashup? Should content created by a team of people in a specific way be rebuilt to fulfill someone else's vision? What if our future generation actually thinks that these songs were originally mashed up, are we changing history? I agree with mashups and especially for content that is INTENDED to be mashed up, but I believe we should be very careful with taking original content and mashing it up to be something not intended by the author (though I do like the version of the Elvis tune at the beginning of the NBC show Las Vegas :-)

- Gotta run!

[2] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg