I recently had a debate with on the open-docuemnt mailing list about the handling of whitespace. Dave Pawson blogged a few things about our exchange on which I would like to comment. Maybe I can also write down a few more useful bits about WS handling in ODF. Dave writes:
The proposition seems to be (for ODF 1.1) that applications (XML applications ala xml 1.1 definitions) can screw with an authors whitespace to their hearts content. If it's at the start of an element, dispose of it. If it's mid element replace it with markup. To me it seems weird. To implementors (IBM Sun and so on) it apparently seems the thing to do... the only rationale I've heard is that it makes XML presenation easier.
Now, this is unforunatly not entirly true. An ODF application should by no means screw around with an authors white space. White space in an ODF document just isn't always represented by literal whitespace in the XML representation of the ODF document.
It is useful to think of literal whitespace in the XML as being a word delimiter in ODF text content (e.g. in a paragraph). A sequence of blanks is represented by markup such as <text:s text:c="10"/>, which would represent 10 blanks in the actual document. This is described in the ODF specification (p. 35, p. 85).
The rationale behind this is, that the literal whitespace in the physical XML representation of the ODF document can be used to format (pretty-print) the markup, without changing the actual ODF document represented by the markup. Something which personally I find very practical. An ODF implementation is in no way encuraged to "screw around" with spaces represented in that way.