Friday, 18 Sep 2009
Friday, 18 Sep 2009
OpenOffice.org 3.2 introduces a new set of document icons (a.k.a. mime type icons). The new set gives ODF documents a clean and unique visual identity, and removes any product or vendor specific brand.
In addition to reasons given in the ESC meeting on March 09, 2009, I would like to underline why it is important to strengthen the ODF brand at all.
End users hardly ever know that they use ODF, and if they do, they usually don't care too much. Is this something we should worry about? Yes. But what is the issue? In one word: Awareness.
Take a look at popular file formats, like JPEG, MP3 or PDF. In terms of awareness, they are well known. This makes users drive demand for their support. Imagine a digital photo frame that wouldn't support JPEG, a digital audio player without MP3 support, or a PC that cannot (by no means) display PDFs. Who would want to have this stuff?
How can we get end users to demand ODF support in a similar fashion? What can be done to raise the awareness of ODF? "Marketing" seems to be part of the answer, but what exactly do we want to market? Which particular visuals do you pick representing ODF so that end users recognize it in their daily life? How does the official ODF community logo translate to the large variety of different visuals used for ODF files on desktops?
A core strength of ODF - being supported by many applications - at the same time is a disservice to the ODF brand. Each and every ODF application uses their own set of ODF icons. That's more to the benefit of each of the applications, or the application vendors, than to ODF's. It gives users a fragmented impression of the ODF brand.
That's why we need to unify the ODF brand experience across all ODF applications and their vendors, starting where most users meet ODF: on their desktops. We want users to perceive ODF as the primary attribute of their documents. By doing so, we unify the ODF experience across applications - we continue to compete on application features, distributions, services, etc. But it becomes apparent to the user, that we all speak one language: ODF.
Last but not least, the unified icon set also takes into account that users collaborate more and more, where information (document) sharing is essential. A wide adoption of the new icon set conveys a strong sense of interoperability between users of different applications.
The new icon set deliberately puts "ODF" on top of the document icon. The distinction between different file types (text document, spreadsheet, presentation, ...) still exists, but has been toned down. Most important, we removed any product specific brand elements (the seagulls), and we eliminated any vendor related design (the "S" curve and specific colors).
The result is a very clean and modern icon set:
On one hand it gives ODF a unique and easy to recognize identity, while on the other hand it competes well in today's attention-grabbing icon-o-mania on user's desktops. BTW "desktop" also refers to typical folders, like "My Documents" or a user's home directory on Linux.
With After (see update) OpenOffice.org 3.2 the new icons will be available for the first time. Though we consider the design pretty mature, we know that evolving and refining these icons will be an ongoing process. But instead of elaborating forever on the design, we want to let users guide this process by giving feedback based on real-life use of OpenOffice.org.
A few words about the next steps. We are going to host the icons on the ODF Toolkit site, because they relate more to ODF than to OpenOffice.org. Of course OpenOffice.org will use them, like the next release of StarOffice will do, too. We would like all other ODF applications to go the same way. This was the original intent when we introduced the subject to the ESC, where stake holders of different applications and platforms are represented. We plan to release the icons as free and open as possible. By all means we want to avoid any sense of vendor or copyright lock-in. This will work only if it really comes free. Also, we are working with OASIS, to get their support for the new design. Most of these steps are currently work in progress, so please bear with us while we work on them.
After receiving a lot of really good feedback - in the comments section and especially on the mailing list discuss@ux - we decided to postpone the integration of the icon set until after OpenOffice.org 3.2. If you would like to work with us on this subject, please watch for more information on the mailing list.
tags: icon odf opendocument openoffice.org
Thursday, 17 Sep 2009
Because some people asked me if we really addressed the issues mentioned here, I just wrote up what we did in my blog.
I don't want to replicate the full article here, so if you are interested, you can read the details here.
tags: odf ooo opendocument openoffice.org privacy security sun
Wednesday, 22 Jul 2009
A new version of ODFDOM - the OpenDocument Java API - has been released!
ODFDOM is an Apache 2 licensed Java library to easily create, access and manipulate ODF documents.
If you never have heard of ODFDOM, you might want to get a quick overview first.
With this release we made an enormous step forward!
It took us several months of continuous refactoring until we finally agreed on shipping this new release. 'We' are in this case Benson Margulies, David Eisenberg and a group of IBM and Sun developers.
The 0.7 is more than worth to be called a release, embracing several stunning new features, the best listed below:
To ease the DOM handling, every element now has methods to create their element children, requiring mandatory information as parameters. With this users no longer have to look into the RelaxNG schema to verify if an element is allowed as a child.
Furthermore there are methods to access all possible attributes of the element, as well as enumerations for their possible values.
Aside of elements and attributes there are now as well classes for all data types listed in ODF 1.2 (mostly W3C schema types, which now bear validate and helper functions).
Finally, the convenient layer of ODFDOM shows examples how easy a final API might look like.
For instance it is a simple three liner to
OdfTextDocument odt = OdfTextDocument.newTextDocument();
odt.addText("My important text");
odt.save("MyExample.odt");
David Eisenberg has contributed new tutorials.
Our JavaDoc has been improved. Now every single ODFDOM element/attribute class links to an XHTML version of the spec, where the functionality of the XML node is being described in detail. This spec is bundled with the JavaDoc (soon the spec will be separated in smaller pieces to speed up loading times).
Last but not least, with the support by Benson Margulies, we were able to switch our build environment from ANT to Maven. By using Maven we made our dependencies declarative. We no longer add dependent JARs to our sources (e.g. the parser XercesImpl.jar), but are able to download them via Maven on demand. Following this idea of modularization, we were able to simplify the source structure of ODFDOM by moving our code-generation to an own new project called relaxng2template.
Still interesting challenges lie in front of us and we hope with this release we are able to arouse interests at like-minded developers to join our efforts!
I hope you all enjoy the new release!
Svante
tags: api java odf odfdom opendocument
Monday, 20 Jul 2009
Goal 1 : improve overall filter performance
Goal 2 : make use of multiple cores/cpu through threading (where threading is not blocked constantly by calls to the OOo core which is currently not supporting multiple threads without blocking)
Goal 3 : support the implementation of load on demand (for example to display the first slides to the user and load the rest of the document in the background)
XPP is a streaming pull XML parser. Unlike sax where the sax parser calls the filter, the filter itself makes calls to the XPP parser to parse the next xml element.
+ In contrast to sax, the filter can interrupt sax parsing after any element and continue parsing later.
+ Pull instead of Push leads to a cleaner filter implementation (cleaner code is usually easier to service and improve).
- Performance is equal to a sax parser
- No random access
A DOM is a parsed memory representation of a xml stream. This technology is used by modern browsers and the odftoolkit.org project.
+ Filter has random access to all xml elements
+ Random access leads to a filter implementation (cleaner code is usualy easier to service and improve).
- A DOM has to store a complete copy of the xml stream + management in memory during the filter process.
I developed the fast sax parser during the initial implementation of the Office 12 XML filters. It is basically a sax parser but uses integer tokens to represent known namespaces, element names and attribute names. Tools like the gnu gperf can be used to create perfect hash code to transform the xml names to integer tokens. Scripts can parse the dtd or relax ng of an xml format to automatically extract all the xml names that needs to be convertable to integers. With utf-8 xml streams, the tokens can be created without the need to transform the strings to another encoding first. Namespaces can be combined with xml names so each element or attribute name with a namespace can be identified with just one integer compare.
+ Reduces string handling (encoding, comparing, storing)
+ Leads to cleaner filter implementation (f.e. switch statements can be used to identify child elements instead of if .. else .. blocks which use the string compare functions)
+ Usage of perfect hash algorithms which are automatically created during compile time
- No random access
Since random access looks like the key to have a filter that supports painless load on demand, I decided to go with a DOM solutions. To minimize the memory footprint of the DOM tree, I used the fast sax parser to build the DOM tree and use the integer tokens for xml names instead of the strings from the stream. Now parsing the xml stream itself does not need any interaction with the application core so this could be done in a separate thread. Converting the xml representation of attribute values to an UNO representation is also something that could be done in almost all cases without the application core, so this should be done in the same thread.
The problem here is that a classic DOM is a generic and typeless representation of the xml data. The solution here is to use a technique we introduced in the odftoolkit.org project. Initially a xslt transformation was used to create dom node implementations for each element of the ODF format with type safe access to its attributes using
only the relax ng. Upkeeping xslt templates proved costly for such complex operations. So this was replaced with a code generator that I implemented in java which uses simple templates and configuration files to transform a relax ng schema to code files.
This code generator also allowed to create DOM tree element implementations for other languages than java which is used in the odftoolkit.org project. Therefore I used the generator to create c++ source files for all ODF elements and I adapted the configuration files to create types for the attributes that are equal to the UNO types that the filter needs to pass to the application core. (The key difference between the ODFDOM from odftoolkit.org and the DOM tree for the prototype is that the former uses ODF based types for the attributes and the later uses UNO types).
So in conclusion, a DOM tree builder is started in a worker thread and parses the xml stream by using the fast sax parser (The prototype actually starts two worker threads, one to build the tree for the styles.xml stream and one for the content.xml stream). It uses the sax events to create a tree where each element is the instance of a class that was specifically generated for that element from the relax ng schema. All attribute values are parsed and stored into an UNO Any with preferable the same type as used in the UNO API of the OOo application. So for example if the attribute is a length value then something like "12cm" is converted into an UNO Any containing an integer value of 1200 (12cm converted to 1/100th mm).
This worker threads would never block or wait for the OOo thread. But this would not make sense if at the same time the office thread idles and waits for the tree builder to finish. So the tree builder notifies the filter as soon as an imported element has been fully parsed. For example if the office:styles element is completely parsed the filter thread can start and import the styles. If the filter gets notified that a slide has been completely parsed, it can check if the needed styles and master pages are already imported and then import this slide. If the filter is also executed in a separate thread, the office thread can paint the already imported slides to enhance the responsiveness of the application to the user which results in a 'subjective' performance gain
To implement the filter I borrowed code from the existing ODF filter and transformed it to use the DOM tree instead of sax events. This mostly resulted in much less and cleaner code, as expected. For a reliable comparison with the existing filter I had to implement a minimum set of functionality so that for selected real world documents the prototype imports all the functionality available in that document.
I ended up implementing
First I used an average real world document with 47 slides and lots of graphics and some chart ole. It showed that the prototype filter was around 2% faster than the original filter. This was less than expected.
Next I created an artificial document. A presentation with 188 slides and only formated text, no graphics, no ole. This pure xml document would be uncommon for a presentation but a good approximation what the gain could be for a writer or calc document where the xml to graphic/ole ratio is much higher. This lead to a performance gain of 10% which is not bad since the prototype is not yet profiled and optimized itself.
I tested this on a dual core 3Ghz system. Experiments with the original filter showed that we are cpu bound and since we use only one core, the processor usage is only 50%. So I expected better results with the threaded prototype by making use of the 3Ghz from the second core. A quick look at a cpu monitor showed that this didn't happen, processor usage was still capped at 50%.
This made me suspect that the actual parsing, tree building and xml to UNO conversion did only account for an insignificant amount of the time the filter needs to import this document. After removing the threading I measured the time it took to parse the xml streams and to actually import the document. It turns out that building the DOM tree for the styles.xml and content.xml accounted for less than 2% of the overal time. So when opening a document that takes 10 seconds to load, the second core is only used 0,2 seconds. Remember that this does not only include the actual xml parsing but also creating the DOM tree in memory and converting most of the attribute values to UNO types.
The interesting finding here is that the overhead from the xml parsing is much less than the typical 'developer gut feeling' about xml. So any performance work in the filter or the application core would be much more efficient then trying to speed up the xml parsing.
Since the usage of DOM is often criticized for its memory consumption, I also took a look at this. While I replaced most strings with 32 bit integer tokens I figured that the memory consumption of a DOM tree should not be greater than the xml stream it was parsed from. My expectation was that for most cases it should be even smaller.
The first measuring showed that it was actually 10 times more than its xml stream. After further investigation I found the source of the problem which is the overhead for each allocated instance from the memory manager.After converting attributes from individual instances to a single vector this dropped to 2x the size of the xml stream. For an impress document this is not a problem as the xml stream size for average documents is seldom more then 1 MB. For a writer document this may also be no problem as for example the huge OpenDocument specification has less than 10MB of xml streams. For calc this may be a problem as calc document with many cells can have xml streams of 100MB and more. For current office workstations this may not be a problem at all, but if OOo runs on a server for multiple users or if OOo would be ported to small devices then this could become an issue.
While the prototype showed that this method of implementing an ODF import filter does result in a performance gain and the option to support load on demand, for an impress application the performance gain of only around 2% for real world documents is out weighted by the actual cost to implement this filter. I currently estimate an effort to at least 6 man month for implementation of only the impress import filter (this is without time for testing which is also crucial to find regressions). For a calc and writer filter these figures may differ.
Since writer and calc are more xml centric then impress documents, a prototype for these applications may show that the overall gain still does out weight the costs.
tags: api import impress odf openoffice.org performance uno xml
Friday, 10 Jul 2009
As promised in a comment in my "Comments on the Black Hat 2009 OOo Security Briefing", I have created the OpenOffice.org Security Project.
I
already did this some weeks ago, but it took me some time to transfer
all the content of my currently existing documents into the Security project's Wiki pages. (This was also a good opportunity to consolidate and clean up some stuff.)
There are different pages for topics like digital signatures and encryption. On the first page you can find a list of all the different items, the ones we are working on now, as well as items that might be addressed some time later in the future.
Currently it's all about digital signatures, encryption and document integrity. I hope I will find some time to also work on stuff for avoiding security vulnerabilities, which includes using certain compiler features as well as guidelines for developers.
Every kind of help is welcome :)
Friday, 29 May 2009
Quite frequently people ask/search for information about OOo's document encryption.
Some answers can now be found here.
tags: encryption odf ooo openoffice.org security
Wednesday, 13 May 2009
I normally don't post my ODF Plugin news and information on GullFOSS, but so many people complain (everywhere, including in OOo mailing lists) about the bad ODF support in Microsoft Office 2007 SP2, that I thought it might be a good idea to post some information about the ODF Plugin here...
The Sun ODF Plugin for Microsoft Office, which is based on OpenOffice.org, adds support for ODF to Microsoft Office 2000 and newer versions. So you don't have to use the very latest Microsoft Office 2007 SP2 version (in case you really need Microsoft Office for some reason) , where ODF support is insufficient anyway.
The ODF plugin still works in MS Office 2007 SP2, allowing document exchange between OpenOffice.org users and Microsoft Office users - which doesn't work in many cases when using the new built-in ODF support in SP2.
For more information about the ODF Plugin, news and additional information, feel free to read my blog regularly - I don't plan to start posting the same information here.
Monday, 11 May 2009
A few days ago I wrote some comments on the Black Hat 2009 OOo Security Briefing in my blog.
Some people asked me why it's not on GullFOSS, and they are probably right that I should have (cross-) posted it here.
The article is quite long, so I wont duplicate it here but invite you to read it in my blog:
http://blogs.sun.com/malte/entry/comments_on_the_black_hat
Translations are available in French and in Hungarian.
Friday, 08 May 2009
This appeared to be a good opportunity to update the ODF Validator at odftoolkit.org (which we are using at Sun's OpenOffice.org development team to check ODF documents) to better support ODF 1.2. The update applies to the command line version of the tool, but also to the online version.
The changes I have implemented are:
In regard to foreign elements and attributes, the “conformance test” mode (option “-c” in the command line version) now corresponds to the ODF 1.2 document conformance class (see ODF 1.2 section 1.4.2.1). Which means that foreign elements and attributes within ODF 1.2 documents are not permitted in this mode. This change does not effect ODF 1.0/1.1 documents, where they are still permitted.
The “conformance test” mode is the default for ODF 1.2 documents.
There is a new option “-e” (extended conformance), which corresponds to the ODF 1.2 extended document conformance class (see ODF 1.2 section 1.4.2.2) in regard to foreign elements and attributes. Which means that foreign elements and attributes within ODF 1.2 documents are permitted in this mode. For ODF 1.0/1.1 documents this mode equals the conformance test mode.
It is now checked whether a document has at least a “content.xml” or a “styles.xml” sub stream. This corresponds to clause (D1.1.2) of ODF 1.2 section 1.4.2.2.
It is now checked whether the “content.xml”, “styles.xml”, “settings.xml” and “meta.xml” steams contain the correct root elements. This corresponds to clause (D1.2.2) of ODF 1.2 section 1.4.2.2.
tags: conformance odf odftoolkit.org opendocument openoffice.org relax-ng rng validator
Wednesday, 25 Mar 2009
tags: events odf opendocument