Introduction

Interoperating services in many ways resemble distributed OO systems so many of the lessons learned from more then twenty years of evolution of object technologies might be directly applicable to the SOA domain. In the beginning when OO was rapidly gaining momentum many systems and languages that did not meet the full set of criteria declared themselves to be object-based by implementing some of the required traits. None of those solutions gained significant traction and success and they quickly sank into well-deserved obscurity. Although they differed in the set of OO features their creators chose to implement, they all had left out the one which was difficult to implement and retrofit: organization of objects into hierarchies of semantic classes.

Similarly the success or failure of services as the next paradigm of enterprise and global computing depends on the ways services are specified, described, advertised, classified, discovered, selected and consumed. Thus according to Steve Burbeck, the term service-oriented should be reserved for architectures that focus on how services are described and organized in a way that supports the dynamic discovery of appropriate services at runtime. Architectural schemes that focus instead on service-to-service message protocols, or on the details of how the various servers communicate rather than what they say to each other, should be characterized as service-based. The latter, although offering a significant improvement over monolithic applications, lack the capacity to scale beyond the sub-enterprise level when service consumers and providers are under control of a single architectural authority.

The ultimate goal of service-oriented architecture is to achieve robust service interoperability through dynamic discovery and binding at run-time. For this to work service consumers and producers must have both syntactic and semantic compatibility. The former can be achieved and validated through the use of XML technologies. The latter requires human intervention. However relying on humans to ensure semantic compatibility is slow, error-prone and since it can only happen at design time can not guarantee future compatibility. Machines on the other hand are not capable of tackling this task alone. The only workable solution to combine both approaches: humans should define semantically-significant categorization schemes, assert service semantics through categorization. Service consumers then will be able to identify relevant categories at design time, allowing computers to use them at run-time as target rich environments for semantically compatible services. Since these categorization schemes will be constantly traversed, they have to be organized into hierarchical directories, or taxonomies, allowing coarse-grained browsing and fine-grained category examination.

Definitions

A cursory search on Google finds multiple definitions for taxonomy: Dictionary

  • A classification of ideas in an orderly hierarchy that indicates a natural or organizational relationship.”[Wordnet]
  • Taxonomy (from Greek taxis meaning arrangement or division and nomos meaning law) is the science of classification according to a pre-determined system, with the resulting catalog used to provide a conceptual framework for discussion, analysis, or information retrieval.” [Whatis.com]
  • A taxonomy is a hierarchical, structured presentation of information by categories.” [Burbeck and Graham]

All of these definitions highlight different facets and characteristics of taxonomies but emphasize their structured nature.

There are two closely related concepts: classification and ontology which are often used interchangeably. However according to Reinout van Rees there are some important differences. The difference between a classification and a taxonomy is that a taxonomy is always hierarchical and classifies entities in a structure according to some relationships between them; while a classification may be flat and use more arbitrary (or external) grounds. A great example of taxonomy based on internal classification is the Library of Congress Classification for books. An example of external reasons would be censorship that classifies books as politically incorrect, subversive or otherwise objectionable. Taxonomy differs from ontology in that the former only organizes knowledge, while the latter also serves as a repository for it.

Common Traits of successful taxonomies

Because humans comprehend information by categorizing it, there are multitudes of taxonomies being used in almost every human activity. They serve different purposes for audiences and are presented in different formats; however all successful have some common traits.

Taxonomies are hierarchical

All of the definitions in the previous section describe taxonomies as hierarchies. This is important to reflect commonalities larger than individual species. Such hierarchy can be represented as either a tree, where every node except a single root has exactly one parent; or a forest, or collection of one or more trees; or as a polyhierarchy which has multiple roots and allows nodes to have multiple parents.

As is the case with biological taxonomies, there is usually more than one way to categorize information, so for any domain there can be more than one possible taxonomy. None of these taxonomies would be sufficient by itself yet together they provide a powerful map for organizing and navigating the information. Consequently a tree-based structure is usually too limiting for representing real-life taxonomies. On the other had polyhierarchy is often more difficult to understand and navigate: e.g. having multiple parent nodes on each level makes it impossible to use familiar breadcrumbs controls to show users where in the hierarchy they are now. As a result forest is the most practical way to represent a taxonomy, where each tree represents an alternative way of classifying the information.

The hierarchies also have to be meaningful: each category should be related to the category above it. But the semantics of these relationships may differ from each other, and can include specialization, aggregation and other domain-specific relationship types.

Taxonomies must be created and maintained by humans

Taxonomic categories must be human engineered like Yahoo Directory or Open Directory Project to reflect the semantics of the categorized entities. Using automatic categorization tools might provide a good starting point for information architects, but would never create a stable clear and useful taxonomy.

Taxonomies are meaningful

In order to be useful, taxonomies must be meaningful in the context of the problem domain, where they are applied. For example the highly developed and proven taxonomy of species from the zoological sciences would be of little use in the livestock industry where majority of categories would be left unoccupied and the few overpopulated categories would lack granularity to classify animals by breed or purpose.

Taxonomies are stable

The structure of taxonomy should change much less frequently then the information it classifies. This allows users and programs to target specific nodes of the hierarchy as target rich environments for semantically compatible entities.

Changes to taxonomies should only occur as a result of the evolution of the structure of knowledge, not the knowledge itself. For example a business taxonomy should remain the same as new companies are formed, or existing companies merge, split, diversify or go out of business, but only change when new kinds of industries emerge. This is another reason why taxonomy structure should be engineered rather then derived from the body of entities it classifies.

Taxonomies are controlled

As a key information asset, taxonomies must be owned, protected, and tightly controlled. The owners are called information brokers. Their responsibility is to define the structure, access mechanisms and policies governing access, browsing, publishing, versioning, etc.

When changes to the structure and semantics of classification categories occur, both publishers and users must be notified. It will be the broker's responsibility to notify interested parties of these changes and provide appropriate version management of categories. Techniques for change and version management will evolve. One approach is to mark categories obsolete as a sign that users of those categories should change to the new categories in a timely manner.

Taxonomies are self-describing

It should be possible to navigate a taxonomy without an external guide. To make this possible, each taxonomy should have a detailed description. If taxonomy is intended to be programmatically navigable, it should include machine-parsable information in addition to human-readable description. Such taxonomies should also use formal mechanisms for describing relationships between categories.

Taxonomies relate to other taxonomies

No single usable taxonomy can be defined for classifying all possible services. Instead most useful taxonomies emerge within various communities of interests and represent their cumulative understanding of specific knowledge domains. Sometimes these taxonomies gain broad acceptance and even become standards. When such domains intersect it creates a situation where there are several fundamentally similar yet distinct classification systems used to classify the same set of entities. When such communities of interest that use different taxonomies over the same domain have to interoperate, they need to define a mapping between their taxonomies. For example US Census Bureau maintains a mapping between North American Industry Classification System (NAICS) and Standard Industrial Classification (SIC) taxonomies.

As the number of interoperating communities of interest and taxonomies they use increases, the need for a taxonomy federation mechanism becomes apparent. A pioneering work in this area is done by the US Defense Information Systems Agency which is developing Core Taxonomy to federate all taxonomies being used within US Department of Defense.

A DoD taxonomy federation example

Taxonomies have to be accepted

Regardless of how well designed and maintained, relevant and precise a taxonomy is, it will never be broadly used if it is not structured the same way its target users think: “People categorize the world not on the inherent qualities of things, but on how they interact with those things.” [George Lakoff]

Existing Approaches to Service Classification

The SOA paradigm is still new, even by the technological standards, and service classification is one of the least developed aspects of SOA implementations. Our research showed that service taxonomies in use today fall under two major categories described below.

Based on existing standards

There are many standardized taxonomies maintained by industry, national and international bodies, and some service registries use them for classifying services. For example last time I checked  HP Systinet registry had built in support for the following taxonomies:

  1. North American Industry Classification System (NAICS)
  2. Standard Industrial Classification (SIC)
  3. Universal Standard Products and Services Codes (UNSPSC)
  4. IS0 3166 Geographic Taxonomy
  5. UDDI Type Taxonomy
  6. UDDI Keyword Taxonomy

These taxonomies are carefully engineered, widely used and well understood. In other words with regards to classifying services they meet all the criteria outlined above but with one important exception — in most cases they are not applicable to services. For example the 2002 NAICS classification has 15 subcategories under code 1111 Oilseed and Grain Farming, and just 7 subcategories (only 3 of which are meaningfully distinct) under code 518 Internet Service Providers, Web Search Portals, and Data Processing Services, which for the foreseeable future would be much more relevant to the SOA players then Oilseed Farming.

Created by SOA Pundits

The need for service classification has long been accepted by the technology experts. Many of them came with their own service taxonomies. Below is a small selection of proposed taxonomies for SOA services that appeared in recent publications:

Scholarly Service Taxonomies

JP Morgenthal:

  • Data
  • Orchestration
  • Image (Document)
  • Business Services
  • Management
  • Security

Bill Roth:

  • Component
  • Data
  • Business
  • Workflow

Randy Heffner:

  • Business
    • Transactional
    • Query & content
    • Analytical
  • Application
    • Functional
    • Data
    • Common
  • Infrastructure

Despite coming from people of very diverse backgrounds (Architect, Executive and Analyst), these taxonomies are remarkably similar: they are layered, well structured, based on deep understanding of SOA principles. They are also absolutely useless to service provider who wants to offer a Customer Invoicing Service, or service consumer in need of a Credit Card Processing Service. These taxonomies (According to the above definitions, these are actually not taxonomies but classifications) are too small to sufficiently narrow services down in order to create a target rich environment, but most importantly they will never be accepted by business community because business users do not relate to services in these terms.

In my next post I will introduce the concept Service Taxonomy Utility which is intended to facilitate development of the relevant and useful service taxonomis.

Comments:

So, does SGF come with nice prolog classification engine?
Can it find for me nice photo sharing service which can print overseas, cost no more than 10% market and give 20% better quality?

Posted by 71.131.196.217 on December 19, 2007 at 10:02 PM CST #

Is http://base.google.com/ is enough?
or
http://del.icio.us/search/?fr=del_icio_us&p=soa+governance&type=all

Posted by 71.131.196.217 on December 19, 2007 at 10:04 PM CST #

I'll try to answer both comments at once. The first one is "no", but the point of this post was that only humans not computers (even if they are programmed in Prolog) can create good quality taxonomies that would consistently deliver target-rich environments. Also since I was talking about defining taxonomies, not using them to classify services, I did not mention that to classify a service into one or more categories is also a job for a human. However, once it is done, SGF can help you to implement this use case using the mechanisms described in http://blogs.sun.com/RealSOA/entry/find_bind_execute

The short answer to the second one is also "no" and the long one will be contained in the second installment of the service taxonomy series.

Posted by Alex Maclinovsky on December 20, 2007 at 10:48 AM CST #

I have lost control of this blog, so i can not update it any more. If you are interested in following my professional enevours, the best place will be on mu profile page http://www.randomfour.com/alex/profile.html at my company site.

Posted by Alex Maclinovsky on October 08, 2009 at 02:11 PM CDT #

I have lost control of this blog, so i can not update it any more. If you are interested in following my professional enevours, the best place will be on mu profile page http://www.randomfour.com/alex/profile.html at my company site.

Posted by Alex Maclinovsky on October 08, 2009 at 02:12 PM CDT #

Thank you for the text

Posted by Haeri on November 18, 2009 at 01:25 AM CST #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by Alex Maclinovsky