Monday, 02 Nov 2009
Monday, 02 Nov 2009
The Build Environment Effort has done a lot of analysis of how our current build process works to find out if and how we can improve the experience of building OpenOffice.org.
One of the things we took a look at is scalability. Currently two-way and four-way machines are standard developer hardware, but this will likely change as it will become more common to have more cores and hardware will become cheaper.
There are two ways to use concurrent processes in the current build process:
build --all -P4 -- -P4
If one would not specify the first -P4, one would run on less than four cores if there are not more targets to build in parallel in one directory, because there is only one directory build at a time. If one would not specify the second -P4, one would run on less than four cores, because sometimes there are no four directories buildable because of dependencies.
However, when enough targets to build are available in both kinds of parallelization, there will be 16 processes running. On Linux, this "overload" alone does not severely slow down the build.
For a current four-way system parallelization is not too bad however:
But when you have 20 cores (with distcc or in the not too distant future) you would have a maximum of 400 processes running and that would slow down the build. Also, the build system has no control over the priorities of the 400 jobs and thus cannot put the ones with the most dependencies first. Thus, the build will be slower, because targets with no or few dependencies are "stealing" CPU-time from more important targets with more dependencies.
Here is a visualization of the number of dmakes running in a -P9 -- -P1 build:

Here is a visualization of the number of dmakes running in a -P4 -- -P4 build:

Note that there can be 20 or more dmake processes starting and dying in one second and the diagram only used the last state change in one second. So if there are N-1 processes running for a -PN build, it is likely that build.pl was just spawning a process at the tick of the second.
To identify the bottlenecks in the build process one has to track the number of processes over the time of a build.
Here is a diagram showing the number of parallel dmakes in a -P9 -- -P1 build: P9P1-Timeline
It shows the number of dmakes running and the modules which are being build at that the given point in time. The bar representing a module starts at the point in time when it is "announced" i.e. when it is buildable, because all dependencies are there. The bar ends at the point in time when the module was delivered to the solver. Note the start of the bar does not per se mean that a process is working on the module: For example a lot of modules depend on svx and not every module will get a process right after svx has been delivered.
One thing easily identified by examining the diagram is a "critical path" -- a sequence of modules, where each module follows the dependency of itself that was delivered last:
(stlport ->) soltools -> xml2cmp -> sal -> salhelper -> registry ->
idlc -> udkapi -> offapi -> offuh -> cppu -> cppuhelper ->
jvmfwk -> stoc -> 18npool -> tools -> unotools -> sot -> vcl -> toolkit -> svtools ->
framework -> basic -> sfx2-> avmedia -> drawinglayer -> svx -> formula -> sc ->
postprocess -> packimages -> instsetoo_native
One can see how the build process "dries out" quite often along this path as modules are waiting for their dependencies to be delivered. These are the bottlenecks of the build. Stlport was not used in this build, but if it would have been used it would be another bottleneck.
Currently parallelization is not as bad as one might have expected for full builds on a regular developer workstation running Linux. However, the comparison of -P9 and -P4 builds shows the current build system has limitations on the scalability that will be more noticeable as systems with higher parallelization become more common. Next, we will present the same analysis for builds on the Windows platform, were builds are traditionally much slower.
tags: build gullfoss linux openoffice.org parallelization scalability
Comments
Great analysis, very interesting. Thank you very much. Especially your P1P1-Timeline diagram shows where it is worth investing in splitting up or restructuring. Though I am really surprised that we do not have more overhead in simple '-P4 -- -P4' setup. It is not that bad.
Posted by Ruediger Timm on November 03, 2009 at 10:28 AM CET #