Wednesday, 28 Oct 2009
Wednesday, 28 Oct 2009
Some weeks ago I announced that our current build system is on trial. Starting with what you are reading now members of the team will report about some of our findings in the next days and weeks.
I want to start with the topic “split build”. Why should we think about it?
Currently there are only three ways to build OpenOffice.org:
build single modules one by one
build a single module and all of its prerequisites
build everything (OOo and ODK)
While the first two options usually are what a developer might want to do in his/her daily work, everybody else will have to choose option three. What we currently miss is building related parts only, especially everything that makes up a particular package.
So the obvious motivation for a split build is the desire to not always build everything. But there is more.
Currently we have roughly 200 “modules” in our build and it's hard to see how they work together in the final product. A meaningful arrangement of modules into groups that make sense can help to see the forest for the trees and understand the basic code architecture better.
By extending these thoughts a bit we get the following requirements for a split build:
the split parts should be buildable in an acceptable time
the split should group modules together that have something in common
the boundaries between the groups should be along stable APIs where they exist
code and non-code modules should be separated
there should be no cross build dependencies between groups (module A in group 1 depends on module B in group 2 and module C in group 2 depends on module D in group 1)
there should be no cross runtime dependencies between groups (module A in group 1 depends on module B in group 2 and module C in group 2 depends on module D in group 1)
Past attempts for a split build failed at least in regard of the last two points, and IMHO especially cross build dependencies are a no-go. So I tried an improvement.
As described in the wiki, it was comparably easy to identify some obvious candidates for separated builds (though it took me some time to fix the remaining “ bad” dependencies between some of their modules as also described in the wiki):
modules containing external code
URE (either with ODK modules included or with ODK as a separate group)
“application” modules (sw, sc, sd, starmath and some related modules)
other “top level “ modules like chart2, desktop, basctl etc.
extensions (swext, sdext, reportbuilder)
non-code modules (l10n, helpcontent2, extras, dictionaries)
deployment and installation related code (scp2, javainstaller2, setup_native etc.)
But most of the remaining modules don't have an obvious separation, they just are “office code”. A deep analysis of its content showed me that a separation into three categories creates a structure that performs quite well regarding the criteria listed above:
modules without any GUI and graphics code (“common”)
GUI and graphics related code modules (“GUI”)
high level modules on top of the other two groups as building blocks for the top level modules mentioned above (“framework”)
As I already outlined in a presentation at the OOoCon in Barcelona, any larger modularity improvements in OOo can't be achieved without many code changes. The changes that were necessary to get a first draft for a split build that matches my mentioned criteria are currently available in a CWS that now will be tested thoroughly. Especially the dependencies of many “low level” functionality modules (like connectivity, linguistic, xmloff) on vcl directly or indirectly (mainly through svtools) were a problem and solving that required a lot of code rework. Details about that and a list some more identified To-Dos can be found in the wiki.
Finally here's the status quo, shown as a block diagram.
The arrows between the blocks denote build dependencies. Each block can be built separately, if its prerequisites (the blocks it depends on) have been built before. I made tests on two platforms (Ubuntu Jaunty, Windows with PCH enabled). Here are the results (all data has been measured on my Toshiba Tecra M5 notebook with 2 GHz Core 2 Duo and 4 GB RAM):
|
extern |
00:17:29 |
00:23:07 |
|
ure |
00:07:26 |
00:14:30 |
|
odk |
00:03:13 |
00:16:34 |
|
common |
00:19:05 |
00:25:45 |
|
content |
00:01:18 |
00:04:07 |
|
deployment |
00:01:23 |
00:03:43 |
|
gui |
00:11:04 |
00:15:33 |
|
framework |
00:25:51 |
00:27:55 |
|
binfilter |
00:17:37 |
00:28:51 |
|
toplevel |
00:10:43 |
00:14:46 |
|
base |
00:07:32 |
00:13:38 |
|
draw |
00:11:46 |
00:07:32 |
|
calc |
00:14:40 |
00:23:20 |
|
writer |
00:19:37 |
00:18:37 |
|
extensions |
00:01:20 |
00:03:20 |
|
packimages |
00:00:53 |
00:03:08 |
|
postprocess |
00:00:02 |
00:00:43 |
|
instset |
00:05:21 |
00:10:10 |
|
total |
02:56:20 |
04:15:19 |
|
“nothing to do” |
00:02:31 |
00:21:12 (00:06:24 without PCH) |
This is not the final result. Currently the “common” group needs some fine structure and/or library redesign. On the higher levels there are still candidates for library splitting and rearrangement (svtools, framework, sfx2, svx, extensions, desktop, goodies). Another option to consider is grouping all filter modules into an own group, currently that would at least require moving the msfilterlib in svx into a separate module or into the filter or oox module. So it's the first step that will be integrated on the DEV300 code line soon, everything else can be done in further steps. Once we have some larger building blocks that can be treated separately, we will have a better overview and can improve the structure step by step.
I haven't thought a lot about how building of packages using this split build should be carried out, this would be one of the next steps. Currently all packages are built inside a single module, instsetoo_native. Once a complete build is available, instsetoo_native is able to detect which packages need to be rebuilt in case e.g. some new libraries have been delivered (“ package pooling”), but there's no way to build single packages from scratch. Basically it should be possible to change instsetoo_native accordingly as a short-term solution.
A split build automatically brings up the question: “And what about the repository”? You can do a split build with the full code repository, but a repository split would go a step further – whether it's a good or a bad move depends on how it is done and how you look at it.
A package builder who wants to build a particular package might want to download only the source code for what is needed to build that package; everything else should reside in development packages. OTOH as a frequent developer (who usually doesn't use dev packages as they are usable only for released versions, not for every milestone a developers happens to work with) you will see some drawbacks:
working on several libraries for one bug fix or features requires to checkout several repositories, keep them in sync etc.
the build system must be more complex as it has to work for arbitrary configurations of source directories
bug fixes touching code in several repositories won't have an identity as a unique change set
splitting repositories without stable (or at least slowly changing) APIs between them is asking for trouble
splitting up the repository will lose the complete history unless we invest a huge amount of time to split the change sets
already existing CWS need to be transplanted to the new repositories what again will be some effort (otherwise CWS have to be converted to patches, losing the individual change sets and their history on the CWS)
The last two points are unavoidable if we ever want to split the repository, and as far as I see it, we will do it somewhat, sometimes. What I would like to see is a split that minimizes the effects of the other drawbacks and perhaps also drops the effort for splitting change sets and transplanting CWS to a level that will allow us to invest that effort and so avoid the loss of history and CWS history. Another option would be to postpone a possible split to a point in time where history loss is bearable: when a new repository is created. This would be the OOO330 code branch-off that currently is planned to happen in roughly six months.
A compromise between the packaging requirements (split repository along the package structure) and the developer requirements (split repository to separate the working areas, but keep code together that probably might get touched in a common work space) should be possible. If we e.g. just removed the binary content from our repository, its size should be already small enough to get fast source code downloads and bearable disk space consumption (compared to the current situation). I believe that nobody will have strong arguments against splitting off extras, helpcontent2, l10n, testautomation, dictionaries, the external modules and perhaps the images. Everything else should be thought through carefully. As the OOO330 code branch-off looks like a good point in time to make the split we should have enough time to do that.
tags:
Comments
I overlooked that my build times table misses the column titles.
The first column contains the time for the Linux builds, the second for the Windows builds with PCH.
Posted by Mathias Bauer on November 01, 2009 at 03:05 PM CET #