Tuesday, 21 Jul 2009
Tuesday, 21 Jul 2009
Some months ago, I did an experiment with creating XML elements only for changed cells. That approach has two drawbacks:
So now I tried a similar approach, based on sheets instead of cells. It consists of the following:
So far, I have done the "automatic styles" part only for cell styles. With this version, I did some performance measurements using the old George Ou example, which contains text, numbers and dates on 16 sheets. The time for saving depends on how many sheets were modified:
The red line is the measured time in the CWS, based on milestone m49. Later milestones contain separate optimizations for saving (see here, here and here in the wiki), m52 is shown in yellow. These optimizations will still help for the sheets that aren't copied, so after rebasing the expected result should be something like the green line. Compared to m52, that's half the time if only one sheet was modified.
Handling of other style types (especially text and shape) is still missing, but shouldn't affect the time for simple-cell-content-only documents. Results for other types of documents will follow. Implementation is ongoing in CWS "calcsheetdata", and there's also a wiki page. Stay tuned.
Friday, 08 May 2009
One of the possible approaches to improving Calc load performance is to separate the processing of individual sheets. The basic idea is that often a file with several sheets is loaded, but only some of the sheets are actually used for editing. A part of processing that can well be separated is the updating of automatic row heights, at least if no shapes have to be adjusted. With the implementation in CWS dr70, row height updates after loading are now limited to sheets with shapes, and the active sheet from the view settings. Other sheets are updated when the row heights are needed (activating the sheet, or printing). There is a progress bar in this case, so the user sees the reason for the delay.

The resulting improvement (in time before the active sheet is shown and ready for editing) depends a lot on the nature of the file. The chart below shows relative CPU time for a single-sheet file, the old George Ou example, a file with only date cells on many sheets, and the file from issue 85178.

There is also a wiki page on the subject. Possible next steps are:
Wednesday, 29 Apr 2009
Recently, as task of the OpenOffice.org Performance Project, I was working on refactoring Calc spreadsheet area broadcasters that are used to notify formula cells listening to ranges such as A1:A256 whenever a cell's value is changed in that range. The results are overwhelming, I didn't even expect such a big improvement.
When loading test case documents (admittedly heavily constructed for this purpose and rarely to be encountered in the wild) with lots of formulas referring different areas in a certain constellation, load times went down to 55%-63% of the original implementation, accompanied by memory consumption down to 75%. Recalculating a change that involves 65536 respectively 131072 different ranges formula cells are listening to went down to 49% respectively 29%. Additionally, time to close the larger document went down from ~10s to ~1s. Savings may be less in real life, because almost no document stresses the implementation as the test cases do, the improvement is there though.
The change is in CWS calcperf04, targeted to OOo3.2 and currently ready for QA.
For details, numbers and links to test case documents please visit the Refactoring Area Broadcasters wiki page.
Monday, 23 Mar 2009
Just released (well, actually already last Friday) has been an extension for Calc which adds two new solver engines.
These engines are based on Evolutionary Algorithms and allow solving nonlinear programming models (like curve fitting as shown in the example below). To use them (instead of the OpenOffice.org Linear Solver that already ships with OOo) you simply have to select them in the options page of the Solver Dialog. For linear problems it is still advised to use the OOo Linear Solver since it will be considerably faster for these problems. Also if you want to know which of the solvers to use, you should consult the documentation of this new extension which gives a lot insight and also explains all the options and parameters that are available. Finally it also includes an example on how to use these solvers from within a macro.
So for everyone who always wanted to solve some nonlinear problem: the time has come to use OOo for it. ;-)
Friday, 27 Feb 2009
Just to have some numbers about what is possible, I did an experiment with incremental saving – generating the XML elements only for changed cells.

You can read about the details in the wiki.
Friday, 28 Nov 2008
Users sometimes rightly complain about the time it takes to load and calculate complex spreadsheet documents. Recently I worked on profiling documents to identify performance bottlenecks. The first outcome is summarized on the specific bottlenecks wiki page. You will like to hear that the Zaske case how I call it, where Excel needed 1.2s and Calc 24s to update results, is solved and Calc now, in a CWS, is on a level with Excel.
The next weeks, except for getting some CWSs ready for QA and pending work done for the ODF formula subcommittee, I'll mainly focus on changing implementation and do further profiling. I'm quite sure there are several opportunities for improvement left..
Monday, 17 Nov 2008
It was a great pleasure to meet RedFlag 2000's Calc developers at OOoCon in Beijing. Of course the successful collaboration continues, resulting in some cool new features:



Friday, 25 Jul 2008
[back-blogging, the act of blogging about things that happened in the past]
Most of my time last year and this year went into the OASIS OpenDocument Format Formula specification (aka ODFF aka OpenFormula) and its implementation for the OOoCalc spreadsheet application.
For OOo3.0 the highlights are:
Function names stored in an OpenDocument file are defined by the OASIS OpenDocument formula subcommittee, and especially names of functions that are implemented as Add-Ins and part of the basic function set differ from those that are stored and expected by earlier releases of OOo that wrote the programmatic name, such as com.sun.star.sheet.addin.Analysis.getDuration that now is simply DURATION.
As that increased the set of formula languages to be supported to 3 (native UI visible, old file format and new ODFF names) plus different cell reference conventions (A1 vs. R1C1 vs. ODFF's bracketed [.A1] schema), parts of the formula compiler's tokenizer and stringizer were rewritten to use different grammar sets.
It now is even possible to use English formulas in a non-English UI, but since that is an undocumented, experimental and unsupported feature, the advanced curious adventurous user could go to Environment Variables (wiki) and look for OOO_CALC_USE_ENGLISH_FORMULAS ;-)
You're already used to the range operator in cell range expressions such as A1:B2, specifying a fixed cell range reference. New is that now the colon is a binary operator taking one reference argument on each side, forming a range encompassing the area of its arguments. Sounds complicated? The area of A1:B2 is of course top left cell A1 to bottom right cell B2. Here are some examples:
If a defined name NAMEDCELL is defined to cell reference C3, A1:NAMEDCELL returns the range reference A1:C3, similar if OTHERCELL is defined to cell reference D4, OTHERCELL:NAMEDCELL returns the range reference C3:D4. Note that the order of arguments does not matter, the range is always justified from top left to bottom right.
This works also for ranges as arguments, e.g. A1:B2:C3 returns a range reference of A1:C3, respectively NAMEDCELL:NAMEDRANGE returns A1:C3 if NAMEDCELL is defined to C3 and NAMEDRANGE is defined to A1:B2.
Functions that return a cell reference can be used as an argument to the range operator, for example if A1 contains the string "C3", B2:INDIRECT(A1) returns the range B2:C3.
See also the feature announcement mail.
The reference concatenation operator or range list operator concatenates cell references. This is also called a union operator, but it is not a union set in the mathematical sense, duplicates are not eliminated. ODFF defines that to be the tilde character, but also the somewhat awkward syntax of another spreadsheet application is supported, where a range list is identified by an extra pair of surrounding parentheses and ranges are separated by the parameter separator. Hence the expressions A1:B2~C3:D4 and (A1:B2;C3:D4) are identical, though the latter is transformed to read (A1:B2~C3:D4) after input.
The AREAS() and INDEX() functions evaluate the number of ranges if a range list was passed.
Functions that expect a range reference parameter and where the order of elements does not matter in evaluation now accept a range list as argument as well. For details see the feature announcement mail.
Sheet names now can consist of almost all characters, except those that one other spreadsheet application does not allow, for interoperability reasons. However, reading a sheet name containing those characters from an OpenDocument file is possible, just using them when giving a sheet a new name is not. For details and when you have to enclose the name in single quotes for cell reference expressions see the feature announcement mail.
Kudos to Caolan and Kohei for providing the patches.
As specified in ODFF for interoperability, the ADDRESS() and INDIRECT() functions now support the parameter that switches between A1 and R1C1 address style. For details see the feature announcement mail.
The ISERROR() function now can be used to check for undefined names, missing Add-In functions and the like. See feature announcement mail.
Many implementation details changed for compliance with ODFF and interoperability. Places to check are the schedule of finished implementations in the wiki and a list of related issues and the list of feature announcement mails.
Kudos to Kohei and Lvyue for patch contribution.
Of course there is more to come for OOo3.1 and OOo3.2, stay tuned.
tags: calc oasis odf odff opendocument openformula openoffice.org spreadsheet
Wednesday, 02 Apr 2008
1. Convert text to columns
With this feature CSV data inside cells can be transformed into columns directly. It uses the already existing Text import dialog and transforms CSV data inside cells into multiple columns. Example: A1 holds '1,2,3,4". After using this feature, 1, 2, 3 and 4 are split over cells.
Start this feature with the menu Data>Text to Columns.
2. Insertion Mode for Cells
Cells
in spreadsheet can be moved, copied, or linked Select one or more
columns or rows, or a range of cells, and hold down Alt while moving
the selection with the mouse. When the mouse button is released, other
cells/columns/rows are moved left or down, to insert the selection.
Before this feature, target cells always were overwritten.
3. Enhanced Data Input
Enter key returns to the column where the input started, one row below Entering data row wise is made easier. The Enter key works like a carriage return-line feed on a type writer and allows to return to the column where the input started, but just a row below the current row.
4. Enhanced Formula Input
Formula input: "+" and "-" can also be used to start Most professional spreadsheet users do use the number pad for inputting data to save time. So it makes sense to allow to enter a formula not just by entering a "=" or hitting the Function button in the Formula tool bar. Therefore "+" and "-" can also be used to start entering a formula. This improves the usability and the speed when creating formulas in a spreadsheet. Furthermore this can be used as a little calculator.
5. Individual zoom level per sheet
Each sheet in a spreadsheet document can now have its own zoom level.
6. Improved AutoFilter Behavior
Choices
are clearer grouped and based on result of filtering in other columns
The AutoFilter function allows to filter data in a spreadsheet by one
criteria per column. This feature allows to run a quick analysis of
data. Two things are new: there is a clearer grouping of static filter
choices. And if in one column a filter is set, the filters in the other
columns only show the options that are available in the visible rows.
7. Improved DataPilot
Manual Sorting It's now possible to rearrange items in a DataPilot field via Drag&Drop or Copy&Paste. The default sorting for new tables is "Ascending", it is changed to "Manual" when items are moved.
Double-click in Datapilot cell provides calculation data of that cell When a cell within the data field is double-clicked, it inserts a new sheet containing a subset of rows from the original data source that constitutes the result data displayed in that cell. For instance, when the data field function is selected to be SUM, then the number that is shown in the data field cell must be identical to the sum of all the data field values in the constituent rows displayed in the inserted sheet.
8. Improved Print Dialog
The Print dialog in Calc makes it easier to choose what part of the spreadsheet has to be printed. The option 'print only selected sheets' is now on that dialog.
Furthermore the 'Sort' option is defaulted to on. This makes it easier to print multiple copies of a document.
9. Improved Print Preview
PageUp and PageDown keys scroll continuously through the print preview. Ctrl + PgUp/PgDn scroll to top of next/previous page.
tags: calc openoffice.org spreadsheet usability user-experience video
Tuesday, 04 Mar 2008
With this year's Google Summer of Code on the horizon, it's time to mention an extension that started as a Summer of Code project last year: R4Calc, the integration of R into Calc, done by Wojciech Gryc, now also linked from the extension repository. It can put results into cells, or return a graph image. It contains a set of pre-defined tools, and it can easily be extended using simple text files. For example, this definition
__GUI: |
is all you need to show this dialog

and create the output shown below.
tags: calc gsoc openoffice.org spreadsheet