The Java Tutorials' Weblog

pageicon Wednesday Jun 27, 2007

JDom and dom4j vs DOM

At RAD 1: Why JRuby?, Tim Bray writes:
"I’ve always been a lean, mean, stream-parser guy; I’d never been near the standard DOM. Well, blecch. Yes, I understand why it is the way it is, but it sure isn’t fun. If I were doing this again I’d look hard at JDOM or XOM.
<rant>
I'm not familiar with XOM, but I sure share his reactions to DOM. "Object model", indeed. It's a structured data model, and nothing more. Nothing object-oriented about it, and therefore very ugly for a Java coder.

So why did I write Sun's XML tutorial using DOM? Because that's what we had. And do know what? For six years it was impossible to write about some important alternatives:

  • JDom — because the JSR never finalized
  • dom4j — because even though it is a much better implementation, it's not our JSR so we couldn't talk about it.
I lived with that every day for six years. I've moved on to other things now, so another author would have to be found to write about one of the other technologies. But I want to go on record (once again), as saying that we do an extraordinary disservice to Java developers if we don't show them how to use the cleanest and smallest parsing APIs, as well as the huge, bloated APIs that our enterprise customers favor.
</rant>

And the same goes for RelaxNG and vs. XML Schema...

-- Eric Armstrong

Comments:

Hi, <uselessComment> I had to manipulate quite a lot of XML lately and discovered JDom. I must say that it's a really great library to work with. Of course I don't write huge applications so I can't really judge it's performance, but for my little needs, it is perfect. The API is really simple and it is very easy to manipulate a Document with it. </uselessComment> I agree witht the fact that the java tutorial should encourage using something else than default DOM, I gave it a try, but really didn't appreciate.

Posted by Sébastien on June 28, 2007 at 11:31 AM PDT #

XOM started as a fork of JDOM but wound up with all-new code. It's a really superb tree API.

From the design document:

Design Goals:

  • Absolutely correct
  • Easy to use
  • Easy to learn
  • Fast enough
  • Small enough
  • No gotchas
Design Principles:
  • Principle of Least Surprise
  • As simple as it can be and no simpler!
  • Use Java idioms where they fit
  • There's exactly one way to do it
  • Start small and grow as necessary

Posted by John Cowan on June 28, 2007 at 07:07 PM PDT #

as wiser minds have pointed out, the html protocol is actually more of a "tag soup". there's little structure there and even in the face of gross violations of what there is, most browsers will make a reasonable fist of displaying your content. "reasonable" is the keyword, one of the frustrating excuses for the huge variations you see as each different browser attempts to render your page, even a page that's been checked against the w3c validator.

Posted by bloodnok on July 15, 2007 at 08:20 AM PDT #

You wrote the Java tutorial on XML? The one that contains statements like this:
"But unlike HTML, XML tags identify the data, rather than specifying how to display it."?

I have been constantly cringing while trying to read that tutorial because it keeps saying that HTML is designed for appearance, not structure.

Where the heck did that idea come from? It sure doesn't match the actual intent of the HTML standard.

Sadly, it matches the reality of how browsers implement the standard, but that's not what the tutorial keeps saying.

Reading such a precisely wrong statement so many times throughout the tutorial makes it very difficult for me to pay attention to the useful content. Any chance someone can correct that?

Posted by Naruki on October 09, 2007 at 11:22 PM PDT #

John Cowan wrote:
> the tutorial keeps saying that HTML is designed for appearance, not structure.
>
Of course, intuiting someone else's *intention* is always tricky business. So it's difficult to say for certain that HTML was designed with that intention in mind. But how else to explain the later appearance of supposedly "semantic" tags like <emphasis> and <strong>? Frankly, those tags are no more or less semantically meaningful than <i> and <b>--they're just longer and that much more work for a coder to deal with.

Similarly, how else does one explain the lack of structure? After all, an <h2> section head doesn't actually contain anything. It just tells the rendering engine what font to use for that particular phrase. You can intermix headings at will, with no structural implications at all.

Then there are the thousands of variations, all of which are legal. A <dd> item can be terminated by a </dt>, another <dd>, a <dt>, or a </dl>. That's not a problem when the markup is automatically generated. In that case, the rendering engine can make simplifying assumptions about what it will see. But when the markup is used for authoring, every possible condition has to be accounted for.

So if HTML was designed as a structuring language, I'd have to say it failed. But as a markup language for display, it has obviously succeeded rather well.

Posted by Eric Armstrong on October 23, 2007 at 11:23 AM PDT #

I agree with the above poster – the HTML protocol is very much like “Tag Soup.” When we talk about W3C though, I'm appalled by how many people claim their site is “W3C compliant” but in actual fact it's not even close to W3C compliant. In the most basic form, it only works on Internet Explorer and you'll find it works very badly on any other browser. As we should have expected, people throw W3C around as a buzzword but it's questionable how many people actually stick to what it says.

Posted by shopping on July 03, 2008 at 02:32 AM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed

« February 2010
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
      
       
Today

Feeds

Search this blog

Links

Weblog menu

Today's referrers

Today's Page Hits: 195