Modular Docs Part 2: DITA vs. DocBook
This is the second in a two-part series. Part 1 describes the motivations for modular documentation. Part 2 zeros in on the reasons for choosing DITA.
When IBM decided to focus on topic-oriented documentation,
it created the Darwin Information Typing Architecture (DITA), even though there was already a huge investment
in DocBook. Moving to a new architecture was a
decidedly non-trivial undertaking--both technically and politically--so it is worth an inquiry as to the reasons for making that move.
Perhaps, one day, we'll be treated to a insider's history of the decision-making process. In the meantime, here are the factors that (I image) played a prominent role in the decision:
- Simplicity
- Extensibility
- Editable Components
- Validatable References
Simplicity
DocBook had 800 elements. The typical installation had to remove 600 of them to get down to something practical. DITA, in contrast, has 120 elements, making it much easier to use "out of the box".
Simplicity is a major driver for adoption, and adoption is the key to a community growth. To succeed, a standard needs a large and vibrant community, so DITA's relative simplicity was key to creating a community that would have vendors and open source projects competing with each other to provide "best of breed" solutions.
Extensibility
A myriad of special cases had combined to create the
monolithic, 800-element standard that was DocBook.
Reducing the number of elements to the bare essentials
covered 80% of the use cases with a fraction of the
elements, but that still left the other 20% that needed to be addressed. DITA's
designers chose to enable solutions for that set
(rather than building them in), by designing DITA to
be extended.
You extend DITA by specializing existing formats, giving
things new names in that process, but retaining references
to the original names. Production systems and
editors can then default to the behaviors associated with
the original types, unless special instructions are provided
for customized processing.
Editable Components
As important as simplicity and extensibility were, however, it is probable that the most serious motivation for the move to DITA came from the need for topic-oriented authoring--and the difficulty of doing that with DocBook. Those difficulties stem from the nature
of the mechanism available for component reuse in DocBook--entity references.
To be reused, a component first had to be taken out
out of its DocBook setting (like removing a wing
from a model airplane). An entity reference could then
be employed to pull it into a document.
But when the component was removed, it had to be placed into
a Document Type Definition (DTD)--a control file for
XML documents that was not itself in XML--so an XML
editor couldn't operate on it--which meant that components,
once extracted, could no longer be edited using normal
authoring tools.
DITA, in contrast, creates discrete components at the
outset, all of which are editable using standard XML
editors.
Validatable References
But perhaps even worse than the inability to edit
a component was the inability to validate either it, or the document that referenced it. In the first place,
a DTD wasn't an XML document, so individual components
couldn't be validated. In the second place, an entity
reference could occur anywhere, making it impossible to
prevent a component from being inserted at an illegal
location.
DITA solved the first problem by using element IDs. A
reference points to the ID of an element in a normal topic, which meant that components are stored in standard XML files. So components could be
edited and validated using standard authoring and
production tools. (And as an additional benefit, a topic could contain multiple
components.)
DITA solved the second problem by using an attribute on an element to create a reference. An element can only be
inserted where it is legal. (Otherwise, it won't pass
validation.) The reference is only valid if it refers to
the same element, or to an extension of that element--a
restriction that is easily validated, or which can be
constrained by the editor. The referenced component,
meanwhile, can be validated independently.
That implementation created what is perhaps the most
significant and at the same time the most underrated feature
of the DITA standard:
If all of the components of a DITA document are separately valid,
then the combined document is guaranteed to be valid, as well.
Conclusion
DITA was designed from the ground up for modular,
component-based documentation. The process was
informed by decades of experience with the DocBook
standard. The issues and restrictions were
carefully identified, and brilliant solutions were
devised.
Because DITA is extensible, "DITA is to documentation
what Object-Oriented is to programming," as co-worker
Sowmya Kannan likes to say. But because DITA is designed
around discrete components that can be edited and
validated, it is equally fair to say that "DITA is to
documentation what integrated circuits were to electronics".
In short, DITA is a standard that lets you assemble new deliverables from existing components--adding only a minimum of additional circuitry--with
full confidence that the base components "just work".
Previous: Modular Docs Part 1: Why You Want Modular, Topic-Oriented Documentation
Thanks for the good two-part writeup - I'm interested in learning more about DITA, seems to be promising.
For my thesis however, I used DocBook and was quite happy with it.
As for component documents, this is no longer true for DocBook 5, where you can use XIncludes to reference external parts of a document.
I have individual chapters (and sometimes sections) that I can edit with no problems in, e.g., the free DocBook-capable Java editor XXE from Xmlmind (which also supports DITA, btw).
References are also checked, since DocBook supports IDs for each element, and you can reference them with the "linkend"-attribute supported by all elements where this makes sense. Any xml-processor, like Xalan, will report missing or wrong references, and I can be sure that all references are valid when it does not complain.
The only problem with DocBook (don't know about DITA) is the complexity of the stylesheets necessary to produce any output. FOP has its own set of issues and does not make things easier. In general, things work very well though, only special cases like landscape tables/figures, SVG or complex index references caused some problems.
Posted by Gregor B. Rosenauer on November 04, 2008 at 02:57 AM PST #