« The Caps Lock key... | Main | "Pictorial" created... »

20040823 Monday August 23, 2004

XHTML is not XML!!

I was always under the impression that XHTML documents are XML documents - with HTML semantics for the tags. You know - must close all tags, tags must be nested, etc. But I recently discovered that that's not the case!

I had this weird bug in Creator that if you did "Preview In Browser" for the following document (not including html, body etc.), rather than have the browser show the text AFTER the text area, it showed up inside the text area!

     <textarea/>Hello World

In particular, in XML you can use a minimized form, such that <foo></foo> can be written as <foo/> instead. But as it turns out, that's not always true in XHTML. In particular, some tags must always be minimized (such as br), and other tags can never be minimized - such as p, div, textarea, and friends. The definition of this is all written in one of the appendices to the xhtml spec.

In Creator we were using standard Xerces to parse and serialize the markup, but because of the above "feature" I can't do that anymore, since xerces will not correctly serialize tags as either minimized or not minimized based on the tag name. This was fixed in patch 1.

Of course, I'm still puzzled as to why Mozilla and other browsers choose to treat the fragment above such that the text shows up inside the textarea... perhaps this is some sort of quirks-mode handling which makes old html documents with errors show up correctly?

(2004-08-23 11:11:44.0) Permalink Comments [3]

Comments:

XHTML does allow minimization of all elements that are empty. The spec appendix you refer to is "HTML Compatibility Guidelines", which is informative, not normative.

Browser behavior depends a lot on whether you serve the content as XML or HTML. If you serve as HTML, the browser uses an HTML parser, and HTML parsers don't know minimized elements - they usually just ignore the "/" before the ">". See http://www.hixie.ch/advocacy/xhtml for an interesting discussion.

Posted by Norbert on August 23, 2004 at 12:04 PM PDT #

Thanks - good point about this being a compatibility requirement versus actually part of the spec.

My surprise was in setting up an XHTML doctype for the document, and finding Mozilla doing the "wrong" thing since I thought of it as very standards compliant. For an HTML doctype I would have been more sympathetic :-)

Of course, in our product we produce xhtml documents and have to have them render in real-world browsers, so I'm stuck following the compatibility guidelines.

Posted by Tor Norbye on August 23, 2004 at 12:10 PM PDT #

The doctype by itself won't kick Mozilla into XML mode. For backwards compatibility, it uses the HTML parser if you serve pages with a text/html Content-Type; serve it as application/xhtml+xml and you'll get the strict, correct XML behavior.

Posted by Brion Vibber on August 23, 2004 at 05:30 PM PDT #

Post a Comment:

Comments are closed for this entry.