Sunday Jun 13, 2004

Kristen's recent posting on web site data models digs into a problem I think a lot of us everywhere have on our web properties: There's a lot of cruft. 

Even if most of what you've got on your Web is state-of-the-art, data-driven pages, chances are if your site has been around a while (Sun.com, for instance, has been around 10 years and our intranet is older than that) you're going to have an embarrassing amount of manually maintained chunks of HTML that aren't really well suited for interchange between different systems.  This state of disarray leads to what the industry humorously calls "manual repurposing" or 'swivel chair interfaces" -- that is, having people mindlessly copy and paste chunks of text from one place to another in order the keep the engines of the ship running, rather like shoveling coal.

It's a bit of a hopeless exercise, because once you've decided that it's easier to copy and paste rather than to invest a week or two thinking about a data model,  you've trapped yourself into a commitment to manually feed a potentially ever-expanding set of sites and systems that all might need the same elements of information and each of which might want to do something different with it. There are some reasonable ways to automate the coal shoveling, such as Web-based syndication services that can snarf up swatches of HTML content that your content partners can have magically appear on their sites, but these only go so far because blobs of HTML inherently contain very little knowledge about the content. And so you might have a nifty product spec description that includes weight or voltage requirements, but if it's imprisoned in an HTML chunk without any special markup, you can't use that weight information in, say, a shipping description because the information just isn't accessible that way. So, somebody retypes it and heaven help them if it changes later.

As a Web "user experience" guy, I care about this problem a lot more than you might think. I care about it because when there's a lack of order in the data, I can't create systematically navigable interfaces around content, and I can't make sure that rendered pages contain all the content they're supposed to, and sometimes I can't even know that the content everywhere is up to date because there's no data model or system of record to consult for each data element.

A good case study in this topic is the new Business Solutions area which just launched on sun.com.  Originally,  much of the content was scattered around the site in literally thousands of HTML files and PDFs, organized into a byzantine set of directory structures that had evolved over the years, with the navigation roughly tracking to the byzantine directories and sometimes not working quite right because it was all hand-maintained through the heoric efforts of various web folks. End result from a user experience standpoint: Lots of great case studies and other information that few people could find. 

Solutions site page topThe new system puts important content in an XML repository and then presents it to the site visitor as if it's it's all organized in one place on the web site. In the new system, content is tagged with metadata... whose taxonomy drives a navigational system... that in turn allows a user to zip quickly to business case studies and articles by industry type, technology solution type, or business goal. The rendered destination pages are fed underneath by data in a content model that is defined in an XML schema that also helps define template-driven authoring that ultimately makes it easier for writers to know what content needs to exist. All of the standard elements of the sun.com design here -- such as the navigational 'breadcrumbing" toward the top of the page -- are all driven by the metadata taxonomy.

It probably sounds complicated (and actually I've oversimplified quite a bit), but what it means to me as UE guy is I can later change around the UI... or make available content for subscription... or use the same stuff on another subsite... or add new features... all without touching any of the original content. And that's all I want from my data models.

This blog copyright 2008 by MartinHardee