GullFOSS
OpenOffice.org Engineering at Sun
 
 
 
 
More Flickr photos tagged with openoffice

Today's Page Hits: 1315

Locations of visitors to this page
Main | Next page »
Friday, 18 Sep 2009
Unified ODF Icons
Lutz Hoeger

What

OpenOffice.org 3.2 introduces a new set of document icons (a.k.a. mime type icons). The new set gives ODF documents a clean and unique visual identity, and removes any product or vendor specific brand.

Why

In addition to reasons given in the ESC meeting on March 09, 2009, I would like to underline why it is important to strengthen the ODF brand at all.

End users hardly ever know that they use ODF, and if they do, they usually don't care too much. Is this something we should worry about? Yes. But what is the issue? In one word: Awareness.

Take a look at popular file formats, like JPEG, MP3 or PDF. In terms of awareness, they are well known. This makes users drive demand for their support. Imagine a digital photo frame that wouldn't support JPEG, a digital audio player without MP3 support, or a PC that cannot (by no means) display PDFs. Who would want to have this stuff?

How can we get end users to demand ODF support in a similar fashion? What can be done to raise the awareness of ODF? "Marketing" seems to be part of the answer, but what exactly do we want to market? Which particular visuals do you pick representing ODF so that end users recognize it in their daily life? How does the official ODF community logo translate to the large variety of different visuals used for ODF files on desktops?

A core strength of ODF - being supported by many applications - at the same time is a disservice to the ODF brand. Each and every ODF application uses their own set of ODF icons. That's more to the benefit of each of the applications, or the application vendors, than to ODF's. It gives users a fragmented impression of the ODF brand.

That's why we need to unify the ODF brand experience across all ODF applications and their vendors, starting where most users meet ODF: on their desktops. We want users to perceive ODF as the primary attribute of their documents. By doing so, we unify the ODF experience across applications - we continue to compete on application features, distributions, services, etc. But it becomes apparent to the user, that we all speak one language: ODF.

Last but not least, the unified icon set also takes into account that users collaborate more and more, where information (document) sharing is essential. A wide adoption of the new icon set conveys a strong sense of interoperability between users of different applications.

How

The new icon set deliberately puts "ODF" on top of the document icon. The distinction between different file types (text document, spreadsheet, presentation, ...) still exists, but has been toned down. Most important, we removed any product specific brand elements (the seagulls), and we eliminated any vendor related design (the "S" curve and specific colors).

The result is a very clean and modern icon set:


On one hand it gives ODF a unique and easy to recognize identity, while on the other hand it competes well in today's attention-grabbing icon-o-mania on user's desktops. BTW "desktop" also refers to typical folders, like "My Documents" or a user's home directory on Linux.

When

With After (see update) OpenOffice.org 3.2 the new icons will be available for the first time. Though we consider the design pretty mature, we know that evolving and refining these icons will be an ongoing process. But instead of elaborating forever on the design, we want to let users guide this process by giving feedback based on real-life use of OpenOffice.org.

A few words about the next steps. We are going to host the icons on the ODF Toolkit site, because they relate more to ODF than to OpenOffice.org. Of course OpenOffice.org will use them, like the next release of StarOffice will do, too. We would like all other ODF applications to go the same way. This was the original intent when we introduced the subject to the ESC, where stake holders of different applications and platforms are represented. We plan to release the icons as free and open as possible. By all means we want to avoid any sense of vendor or copyright lock-in. This will work only if it really comes free. Also, we are working with OASIS, to get their support for the new design. Most of these steps are currently work in progress, so please bear with us while we work on them.

UPDATE

After receiving a lot of really good feedback - in the comments section and especially on the mailing list discuss@ux - we decided to postpone the integration of the icon set until after OpenOffice.org 3.2. If you would like to work with us on this subject, please watch for more information on the mailing list.


tags:

Posted by Lutz Hoeger on 18 Sep 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[34]

Thursday, 17 Sep 2009
Security and Privacy Feature Improvements in upcoming OpenOffice.org 3.2
Malte Timmermann

Because some people asked me if we really addressed the issues mentioned here, I just wrote up what we did in my blog.

I don't want to replicate the full article here, so if you are interested, you can read the details here.

tags:

Posted by Malte Timmermann on 17 Sep 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Wednesday, 22 Jul 2009
ODFDOM 0.7 - the new Release of the OpenDocument Java Library
Svante Schubert

A new version of ODFDOM - the OpenDocument Java API - has been released!
ODFDOM is an Apache 2 licensed Java library to easily create, access and manipulate ODF documents.

If you never have heard of ODFDOM, you might want to get a quick overview first.

With this release we made an enormous step forward!
It took us several months of continuous refactoring until we finally agreed on shipping this new release. 'We' are in this case Benson Margulies, David Eisenberg and a group of IBM and Sun developers.

The 0.7 is more than worth to be called a release, embracing several stunning new features, the best listed below:

ODF 1.2 support

With 0.7 we support the latest revision of the ODF 1.2 community draft, which has been published last week.
Supporting means as well that there is a Java class for every XML element and attribute provided by ODF 1.2.
As we generate the classes directly from the RelaxNG schema, we are able to easily update on ODF changes.
Currently we create the 599 elements and 1301 attributes of ODF 1.2 in less than 20 seconds.

ODF usability

To ease the DOM handling, every element now has methods to create their element children, requiring mandatory information as parameters. With this users no longer have to look into the RelaxNG schema to verify if an element is allowed as a child.
Furthermore there are methods to access all possible attributes of the element, as well as enumerations for their possible values.
Aside of elements and attributes there are now as well classes for all data types listed in ODF 1.2 (mostly W3C schema types, which now bear validate and helper functions).

Finally, the convenient layer of ODFDOM shows examples how easy a final API might look like.
For instance it is a simple three liner to

  1. create a new ODF document,
  2. add text and
  3. save the new document.

OdfTextDocument odt = OdfTextDocument.newTextDocument();
odt.addText("My important text");
odt.save("MyExample.odt");

Documentation

David Eisenberg has contributed new tutorials.
Our JavaDoc has been improved. Now every single ODFDOM element/attribute class links to an XHTML version of the spec, where the functionality of the XML node is being described in detail. This spec is bundled with the JavaDoc (soon the spec will be separated in smaller pieces to speed up loading times).

Build environment

Last but not least, with the support by Benson Margulies, we were able to switch our build environment from ANT to Maven. By using Maven we made our dependencies declarative. We no longer add dependent JARs to our sources (e.g. the parser XercesImpl.jar), but are able to download them via Maven on demand. Following this idea of modularization, we were able to simplify the source structure of ODFDOM by moving our code-generation to an own new project called relaxng2template.

Still interesting challenges lie in front of us and we hope with this release we are able to arouse interests at like-minded developers to join our efforts!

I hope you all enjoy the new release!
Svante

tags:

Posted by Svante Schubert on 22 Jul 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Monday, 20 Jul 2009
XML Performance, and now for something completely different...
Christian Lippka
While Armin Le Grand did a great job at improving the load/save Performance for presentation documents by tweaking the application core itself, I took a step back and thought about performance improvement by using different technologies. The first step was to look at other techniques to deal with xml documents that would have one or more advantages over the current implementation. Since according to Helmuth von Moltke "No battle plan survives contact with the enemy" I had to test my assumptions and so my theoretical work on this resulted in a  prototype import filter implementation for impress. This is a short  summary of what techniques I looked at, why and how I used them in the prototype and the interesting results I got from it.

Mission Statement

The mission of this prototype was to gather data how the utilization of new technology could enhance the performance of the native OpenDocument Format (ODF) filters for OpenOffice.org (OOo). The focus of this prototype was to first look at the import of impress documents and achieve the following three goals

Goal 1 : improve overall filter performance

Goal 2 : make use of multiple cores/cpu through threading (where threading is not blocked constantly by calls to the OOo core which is currently not supporting multiple threads without blocking)

Goal 3 : support the implementation of load on demand (for example to display the first slides to the user and load the rest of the document in the background)

Current state

Currently ODF is imported with a SAX based filter that is accessed over the UNO API. Therefore the current SAX parser pushes notifications about the xml elements to the current ODF filter implementation in the order they appear in the xml stream. The filter itself has no control on choosing which elements to parse first or to postpone the parsing of the current element for a later time. It makes it also nearly impossible to measure the time spend for the current xml parsing because this tight coupling between the SAX parser and the ODF filter.

XML streams from ODF documents are usually encoded as UTF-8. The UNO API uses strings in UTF-16 encoding. Therefore, the SAX parser converts all strings from the xml stream to UTF-16 which is used by the UNO string implementation. To identify xml element and attribute names, expensive string compares are conducted.

Between the ODF filter and the current OOo application core is another UNO API layer. The filter has to convert the xml events to something  that can be send over this layer to the OOo application core. In most cases the implementation of the API layer must also convert the given data to the format used in the applications core.

the detailed flow of data is currently as follows
  1. SAX parser reads xml data of utf-8 streams inside zip storages
  2. SAX parser UNO implementation handles namespaces and converts element names, attribute names, attribute values and text content to utf-16 strings and feeds the ODF filter implementation
  3. ODF filter implementation transforms utf-16 xml data to UNO representations for the OOo UNO API
  4. The applications UNO API implementation transforms the UNO data to a core data representation

Assumptions

Alternative technology

XML Pull Parser (XPP)

XPP is a streaming pull XML parser. Unlike sax where the sax parser calls the filter, the filter itself makes calls to the XPP parser to parse the next xml element.

+ In contrast to sax, the filter can interrupt sax parsing after any element and continue parsing later.
+ Pull instead of Push leads to a cleaner filter implementation (cleaner code is usually easier to service and improve).

- Performance is equal to a sax parser
- No random access

Document Object Model (DOM)

A DOM is a parsed memory representation of a xml stream. This technology is used by modern browsers and the odftoolkit.org project.

+ Filter has random access to all xml elements
+ Random access leads to a filter implementation (cleaner code is usualy easier to service and improve).

- A DOM has to store a complete copy of the xml stream + management in memory during the filter process.

Fast sax parser

I developed the fast sax parser during the initial implementation of the Office 12 XML filters. It is basically a sax parser but uses integer tokens to represent known namespaces, element names and attribute names. Tools like the gnu gperf can be used to create perfect hash code to transform the xml names to integer tokens. Scripts can parse the dtd or relax ng of an xml format to automatically extract all the xml names that needs to be convertable to integers. With utf-8 xml streams, the tokens can be created without the need to transform the strings to another encoding first. Namespaces can be combined with xml names so each element or attribute name with a namespace can be identified with just one integer compare.

+ Reduces string handling (encoding, comparing, storing)
+ Leads to cleaner filter implementation (f.e. switch statements can be used to identify child elements instead of if .. else .. blocks which use the string compare functions)
+ Usage of perfect hash algorithms which are automatically created during compile time

- No random access

The Prototype

Since random access looks like the key to have a filter that supports painless load on demand, I decided to go with a DOM solutions. To minimize the memory footprint of the DOM tree, I used the fast sax parser to build the DOM tree and use the integer tokens for xml names instead of the strings from the stream. Now parsing the xml  stream itself does not need any interaction with the application core so this could be done in a separate thread. Converting the xml representation of attribute values to an UNO  representation is also something that could be done in almost all cases without the application core, so this should be done in the same thread.

The problem here is that a classic DOM is a generic and typeless representation of the xml data. The solution here is to use a technique we introduced in the odftoolkit.org project. Initially a xslt transformation was used to create dom node implementations for each element of the ODF format with type safe access to its attributes using
only the relax ng. Upkeeping xslt templates proved costly for such complex operations. So this was replaced with a code generator that I implemented in java which uses simple templates and configuration files to transform a relax ng schema to code files.

This code generator also allowed to create DOM tree element implementations for other languages than java which is used in the  odftoolkit.org project. Therefore I used the generator to create c++ source files for all ODF elements and I adapted the configuration files to create types for the attributes that are equal to the UNO types that the filter needs to pass to the application core. (The key difference between the ODFDOM from odftoolkit.org and the DOM tree for the prototype is that the former uses ODF based types for the attributes and the later uses UNO types).

So in conclusion, a DOM tree builder is started in a worker thread and parses the xml stream by using the fast sax parser (The prototype actually starts two worker threads, one to build the tree for the styles.xml stream and one for the content.xml stream). It uses the sax events to create a tree where each element is the instance of a class that was specifically generated for that element from the relax ng schema. All attribute values are parsed and stored into an UNO Any with preferable the same type as used in the UNO API of the OOo application. So for example if the attribute is a length value then something like "12cm" is converted into an UNO Any containing an integer value of 1200 (12cm converted to 1/100th mm).

This worker threads would never block or wait for the OOo thread. But this would not make sense if at the same time the office thread idles and waits for the tree builder to finish. So the tree builder notifies the filter as soon as an imported element has been fully parsed. For example if the office:styles element is completely parsed the filter thread can start and import the styles. If the filter gets notified that a slide has been completely parsed, it can check if the needed styles and master pages are already imported and then import this slide. If the filter is also executed in a separate thread, the office thread can paint the already imported slides to enhance the responsiveness of the application to the user which results in a 'subjective' performance gain

To implement the filter I borrowed code from the existing ODF filter and transformed it to use the DOM tree instead of sax events. This mostly resulted in much less and cleaner code, as expected. For a reliable comparison with the existing filter I had to implement a minimum set of functionality so that for selected real world documents the prototype imports all the functionality available in that document.

I ended up implementing

Results

First I used an average real world document with 47 slides and lots of graphics and some chart ole. It showed that the prototype filter was around 2% faster than the original filter. This was less than expected.

Next I created an artificial document. A presentation with 188 slides and only formated text, no graphics, no ole. This pure xml document would be uncommon for a presentation but a good approximation what the gain could be for a writer or calc document where the xml to graphic/ole ratio is much higher. This lead to a performance gain of 10% which is not bad since the prototype is not yet profiled and optimized itself.

I tested this on a dual core 3Ghz system. Experiments with the original filter showed that we are cpu bound and since we use only one core, the processor usage is only 50%. So I expected better results with the threaded prototype by making use of the 3Ghz from the second core. A quick look at a cpu monitor showed that this didn't happen, processor usage was still capped at 50%.

This made me suspect that the actual parsing, tree building and xml to UNO conversion did only account for an insignificant amount of the time the filter needs to import this document. After removing the threading I measured the time it took to parse the xml streams and to actually import the document. It turns out that building the DOM tree for the styles.xml and content.xml accounted for less than 2% of the overal time. So when opening a document that takes 10 seconds to load, the second core is only used 0,2 seconds. Remember that this does not only include the actual xml parsing but also creating the DOM tree in memory and converting most of the attribute values to UNO types.

The interesting finding here is that the overhead from the xml parsing is much less than the typical 'developer gut feeling' about xml. So any performance work in the filter or the application core  would be much more efficient then trying to speed up the xml parsing.

Since the usage of DOM is often criticized for its memory consumption, I also took a look at this. While I replaced most strings with 32 bit integer tokens I figured that the memory consumption of a DOM tree should not be greater than the xml stream it was parsed from. My expectation was that for most cases it should be even smaller.

The first measuring showed that it was actually 10 times more than its xml stream. After further investigation I found the source of the problem which is the overhead for each allocated instance from the memory manager.After converting attributes from individual instances to a single vector this dropped to 2x the size of the xml stream. For an impress document this is not a problem as the xml stream size for average documents is seldom more then 1 MB. For a writer document this may also be no problem as for example the huge OpenDocument specification has less than 10MB of xml streams. For calc this may be a problem as calc document with many cells can have xml streams of 100MB and more. For current office  workstations this may not be a problem at all, but if OOo runs on a server for multiple users or if OOo would be ported to small devices then this could become an issue.

Conclusion

While the prototype showed that this method of implementing an ODF import filter does result in a performance gain and the option to support load on demand, for an impress application the performance gain of only around 2% for real world documents is out weighted by the actual cost to implement this filter. I currently estimate an effort to at least 6 man month for implementation of only the impress import filter (this is without time for testing which is also crucial to find regressions). For a calc and writer filter these figures may differ.

Since writer and calc are more xml centric then impress documents, a prototype for these applications may show that the overall gain still does out weight the costs.




tags:

Posted by Christian Lippka on 20 Jul 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[8]

Friday, 10 Jul 2009
OpenOffice.org Security Project
Malte Timmermann

As promised in a comment in my "Comments on the Black Hat 2009 OOo Security Briefing", I have created the OpenOffice.org Security Project.

I already did this some weeks ago, but it took me some time to transfer all the content of my currently existing documents into the Security project's Wiki pages. (This was also a good opportunity to consolidate and clean up some stuff.)

There are different pages for topics like digital signatures and encryption. On the first page you can find a list of all the different items, the ones we are working on now, as well as items that might be addressed some time later in the future.

Currently it's all about digital signatures, encryption and document integrity. I hope I will find some time to also work on stuff for avoiding security vulnerabilities, which includes using certain compiler features as well as guidelines for developers.

Every kind of help is welcome :)


tags:

Posted by Malte Timmermann on 10 Jul 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this  |  Comments[1]

Friday, 29 May 2009
ODF / OpenOffice.org Document Encryption
Malte Timmermann

Quite frequently people ask/search for information about OOo's document encryption.

Some answers can now be found here

tags:

Posted by Malte Timmermann on 29 May 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Wednesday, 13 May 2009
Better ODF support in Microsoft Office via Sun's ODF Plugin
Malte Timmermann

I normally don't post my ODF Plugin news and information on GullFOSS, but so many people complain (everywhere, including in OOo mailing lists) about the bad ODF support in Microsoft Office 2007 SP2, that I thought it might be a good idea to post some information about the ODF Plugin here...

The Sun ODF Plugin for Microsoft Office, which is based on OpenOffice.org, adds support for ODF to Microsoft Office 2000 and newer versions. So you don't have to use the very latest Microsoft Office 2007 SP2 version (in case you really need Microsoft Office for some reason) , where ODF support is insufficient anyway.

The ODF plugin still works in MS Office 2007 SP2, allowing document exchange between OpenOffice.org users and Microsoft Office users - which doesn't work in many cases when using the new built-in ODF support in SP2.

For more information about the ODF Plugin, news and additional information, feel free to read my blog regularly - I don't plan to start posting the same information here.


tags:

Posted by Malte Timmermann on 13 May 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Monday, 11 May 2009
Comments on the Black Hat 2009 OOo Security Briefing
Malte Timmermann

A few days ago I wrote some comments on the Black Hat 2009 OOo Security Briefing in my blog.

Some people asked me why it's not on GullFOSS, and they are probably right that I should have (cross-) posted it here.

The article is quite long, so I wont duplicate it here but invite you to read it in my blog:

http://blogs.sun.com/malte/entry/comments_on_the_black_hat

Translations are available in French and in Hungarian.

tags:

Posted by Malte Timmermann on 11 May 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Friday, 08 May 2009
ODF Validator Update
Michael Brauer
Last week, the OASIS OpenDocument TC approved the most recent draft for part 1 of the ODF 1.2 specification as a Committee Draft 02. This was another large step toward finalizing ODF 1.2.

This appeared to be a good opportunity to update the ODF Validator at odftoolkit.org (which we are using at Sun's OpenOffice.org development team to check ODF documents) to better support ODF 1.2. The update applies to the command line version of the tool, but also to the online version.

The changes I have implemented are:

tags:

Posted by Michael Brauer on 08 May 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Wednesday, 25 Mar 2009
Back from Document Freedom Day event in Hamburg
Joost Andrae
Not only today is a Document Freedom Day but we celebrated it today. I attended an event in Hamburg where we had three places where volunteers (mostly from exis-unlimited.org, FSF Europe and from Sun) talked about the advantages of open and standardized document formats and about the ODF document format in detail. The weather situation (snow, wind and again snow) forced us to give up two of the booths after some hours and to concentrate all activites at one place. In my opinion this event was quite successful and I'm looking forward to attend similar events in future. Maybe you'd like to plan something similar ?

tags:

Posted by Joost Andrae on 25 Mar 2009  |  PermaLink |  Bookmark to Delicious To Delicious |  Digg this Digg this

Main | Next page » GullFOSS