The first element I shall cover of the six I have dubbed “SOA Essentials” is the Common Data Model. I'm aware that I could spark some disagreement here, but I'm stepping forward willingly...
One of the key benefits of SOA in my view is flexibility, so hold that thought...
In a SOA implementation there's a possibility, if not a racing certainty, that there will be a collection of disparate systems on heterogeneous platforms. Each of these systems on their own would most likely not even acknowledge each other's existence, let alone converse. But converse is exactly what they are required to do, so some common ground for the transfer of information between these systems is needed - and this is where the Common Data Model fits.
The purpose of this model is simply to have a consistent, platform-neutral way to represent key information so that it can be passed between participating systems throughout the SOA. By having the model in a platform neutral form, you reap a number of benefits including:
* for each new system/service added into the architecture, there is only one set of data translations/transformations that needs to be developed – from the “native” form of the new system into the Common Model and back.
* you avoid acquiring “spaghetti by stealth”, whereby unintended hard dependencies between services are created through the use of underlying “native” data representations from one or more of those systems/services. This helps yield that oh-so-important flexibility.
* if any of these services are eventually exposed to external organizations (e.g. partners, consortium members), you aren't also needlessly exposing information about the underlying platforms that might make them more at risk from malicious attack
* you don't need domain-expertise for each of the individual underlying platforms beyond creating (and extending) the original service capabilities and their “private” data translation pieces: once those components are created, you can combine and choreograph their interactions free from any requirement for specialist expertise of data types, formats, value limits, etc.
So where do you start?
Firstly, don't try to boil the ocean: you don't need to create a data model for the whole of your organization before you can do anything else. Be pragmatic and bear in mind Pareto's 80:20 rule. It's also unlikely that you'll be passing round every type of data entity within your organisation – and this model is for passing stuff around.
Also the data model isn't necessarily where you'd start your design efforts, but it'll be pretty early in your implementation task list. That's not unusual: most buildings are not designed from the foundations up – you start from creating the overall appearance given the context of its intended location and purpose and then go into the details of what rooms will go on what floor, where the boundaries of those rooms are, how things can pass between the various rooms and between floors and what common facilities are required where throughout the building. From there you can focus on the detail of the materials of construction of the various parts and then ensure that you have the appropriate underlying foundations designed to actually support the new edifice. Building order is different – typically (due to that thing called “gravity”) from the ground up. I have seen one building where the roof was put in place before the walls (it was held up with temporary scaffolding), but this was because of a) some peculiar attributes of the building site and access to it and b) some peculiar attributes of the builder!
The reason why I'm using the data model as my starting point here is because it pervades all the other essential elements, so it's best to describe it first for later clarity.
When starting to build such a model there is a spectrum of approaches, from the highly formal “let's build a full ontology”, to the highly pragmatic “let's examine the minimum that we need”. My advice here is that few organizations have either made or saved vast sums of money by having a fully complete, wall-to-wall data model or ontology in place. They are typically hideously expensive and contentious to set up, are unwieldy and can quickly fall out of date, and thence into disrepute and out of use. However, having a corporate-wide agreement on things like the representation of an address (number of lines, what goes on what line, line length, formatting, etc.) can yield very significant benefits in terms of data cleanliness, single-view-of-customer and ease of integration. So I prefer a mixture of minimalism and anticipation: do what you need now, but be aware of what you can reasonably expect to need in the future.
A thing to note here: as I indicated near the beginning of this piece, there is likely to be a mix of disparate systems in use within the complete architecture. They will be on different platforms and are likely to not all be object-aware. So I'm not saying build an object model. What I'm suggesting is what Martin Fowler calls an Anemic Domain Model – and he describes it as an anti-pattern – something to be avoided at all costs. But this view (I feel) is from a purely OO standpoint – and the whole world is not OO. So trying to attach 'behaviour' to the data extracted from a screen-scraper accessing a mainframe systems (so making it non-anemic) is to my mind an exercise in futility driven by a wish to ignore the realities of inter-system integration.
A word of warning: don't think that you will be able to finish your data model yet – even if you have a good grasp of your business activities and processes. It will need to be extended to cover operational (non-functional) aspects of the system, to which we will return at a later time.
There's also an argument that you need to develop (or at least use) two data models: one internally and one where services and capabilities are exposed externally. At the moment most people are simply focusing on trying to 'join the dots' and actually get a working SOA implementation rather than considering from the outset all the grown-up aspects like security and deliberate obfuscation. If you favour this approach (and I do by preference), then trying to settle on some industry-wide data model (if one exists) for your external view is probably going to be the best approach in the long run. That way, if you need to work in some kind of partnership, consortium or virtual organisation, there should be fewer politics over who's 'right' about the representation of a 'customer' when the various IT infrastructures need to be joined up.


