Friday, 09 May 2008
Friday, 09 May 2008
One week later – 21st of April – the CWS was again ready for QA and with the integration of the CWS xsltfilter09 to the DEV300 master workspace the OpenOffice.org will get some really great enhancements for the XHTML export and also the other XSLT based filters (e.g. DocBook XML):
change of the default extension for the XHTML export from .xhtml to .html
with that change the Internet Explorer is able to load the exported XHTML documents of OOo [issue 85268]
Footnotes are now supported also in XHTML export [issue 34424]
Field values are now supported in XHTML export [issue 75125]
Export of headings greater six now according to XHTML specification [issue 80679]
Great work Mathias and Svante!
Wednesday, 12 Sep 2007
Some months ago we started to look out for a replacement of CVS, our current SCM (Software Configuration Management) tool. Progress has been slow on this matter, for a number of reasons. In short, we have not yet decided which will be the new tool. I'll present the current state of the discussion in more detail next week on the OOoCon 2007 in Barcelona.
Meanwhile we continue to use CVS, with its perceived problems, mainly the lack of performance. To recapitulate, CVS is slow because it does store change sets on a file-to-file basis. To find the difference between one OOo milestone and another, CVS has to check all files involved - given the number of files in the OOo source this is a time consuming task. The simple act of tagging milestones changes every living archive inside the OOo source code repository. Tag or branch operations over the whole OOo source code tree literary move gigabytes of data on the CVS server.
The CWS system copes with the CVS performance characteristics by restricting the tag operation on just the modules the devloper needs in the CWS. Depending on the modules and the load on the server this takes from some seconds to some minutes. Not nice, but bearable.
A CWS which is open for more than just a few days needs to be rebased to a newer milestone eventually. And this is the point where it starts to hurt. Our rebase tool - cwsresync - retrieves the changes between the current and the new milestone and then applies a number of CVS operations on every file which has changed between the milestones. For a long running CWS - with a number of modules added - this can be many thousand files. Since cwsresync up to now relied on the rather inflexible CVS command line client for doing the job, it had to do the CVS operations file by file. If - say - 1000 files needed to be touched during a rebase, cwsresync would start the cvs client 1000-2000 times during the "cwsresync -m" step and typically about 3000 times for the "cwsresync -c" step. Each time, the client has to open a connection to the CVS server, authenticate, bear the network lag etc. Timings showed, that every run of the command line client takes about 10s on average, summing up to more than 8 hours for the "cwsresync -c" step alone. Since 1000 files are not even a particularly huge number of files for a rebase, OOo developers experienced cwsresync runs which took days.
It's very easy to see that the cost of starting the CVS client and opening a connection to the server totally dominates the time needed for a rebase. Why not do some bookkeeping about which file falls in which category (binary, added, removed etc) and then batch the CVS operations? Here the inflexibility of the command line client comes into play, especially the error handling was very hard to get right. I feared the irrecoverable mingling of a CWS if someone used newer CVS clients or servers where small details of error reporting changed, so I dropped this approach for the initial release of cwsresync.
But with SRC680 m227 I was (finally!) able to get a long promised and much improved cwsresync up and running which does just this. The new cwsresync is implemented around an old pet project of mine which is called PCVSLib, a native Perl implementation of a CVS client library right on top of the CVS protocol. PCVSLib took a number of ideas from the netbeans CVS client library, which I would like to grateful acknowledge here.
PCVSLib allows a very fine grained control over CVS operations so that cwsresync can now work with batches of operations and only one connection to the CVS server per module is opened. And this certainly shows in the benchmarks!
Example: Quite old CWS, based on SRC680 m203:
Module vcl with 229 new files, 200 removed files, 2 merged file, 339 moved tags.
Time needed for "cwsresync -m m228 vcl"
| cwsresync with PCVSLib | 1 min 14s |
| cwsresync with command line client | 26 min 43s |
Now, "cwsresync -m" was always the fast part. "cwsresync -c" for the above example takes only a minute or so with the new cwsresync and an estimated (229+200+2+339)*3*10s = 23100s for the old cwsresync, that is more than 6h for just one module.
Other changes to cwsresync include better detection for certain "alert" conditions and a hopefully more readable output.
tags: cws openoffice.org