Kelly O'Hair's Weblog (blogs.sun.com)
Monday Apr 13, 2009
Source Repository Rules
A few commandments for dealing with Software repositories (source bases) and the build results of those repositories.
- There shalt not be binary files in the repository.
Binary files (executables, native libraries, zip files, jar files, etc.) are NOT source and should not be managed in a source repository.- Keep thy path names simple.
Directory names and filenames in the repositories should never contain blanks or non-printing characters. Certain characters such as '$' should also be avoided.- There shall be one newline convention.
The contents of all source files should follow the standard unix conventions on newlines (no ^M's).- Generated source files shall not be added as managed files.
Source files generated during the build process should not be managed files in a repository.- All output from the build shall be kept separated from the source.
All files generated during the build should land in a well defined output only directory such as build/ or dist/. The src/ directory should never get written to during the build process.
Of course the commandments will change from time to time.
Reminds me of this scene in Mel Brooks
"History of the World Part II"
movie, loved the ending "These 15 Commandments... oops... 10 Commandments!". ;^)
-kto
Posted at 09:58AM Apr 13, 2009 by kto in Java | Comments[7]












Can you explain the rationale behind 1 and 4?
I know that in theory you can rebuild binaries and generated source, but there may be uncontrolled factors (environment, compiler version, etc.)
Also what about binaries required for an initial build (bootstrapping)?
Posted by Robert Helmer on April 13, 2009 at 11:16 AM PDT #
On 1: Binary files are not source, they bloat a repository and become a burden to anyone attempting to build a product from known origins.
On 4: If files can be generated, then they are not the real source of a product, and represent a maintenance issue on how and when they might need to be re-generated.
Other build dependencies fall into different categories; system components that need to be installed on the machine being used, and potentially imported components from known public or private locations. In my opinion, adding these components to a repository is a mistake because they are just being used to build your product and don't 'define' your product.
I prefer to see the source to products managed by the people and projects that have declared some kind of ownership of those files.
-kto
Posted by Kelly Ohair on April 13, 2009 at 11:54 AM PDT #
On 1:
- binary JARs typically contain no information on who built them, how, from what source version and which source code provider, when, with what patches
- that in turn implies it's hard to verify if for example the licensing of a binary file is as it is advertised, or whether it contains surprises
- in some countries, unwittingly including crypto in your code without going to corresponding procedures could result in an unpleasant surprise later
On 4:
- if it can be generated once, it can be regenerated more times. If it can't be regenerated, then the build process is broken by design. ;)
- if it can be regenerated, there is no need to drag it around in a source code repository, as it is not source code.
- in addition, one avoids spurious diffs caused by people regenerating the same thing over and over again with tiny but irrelevant changes to the generated output, like the order of function declarations in a generated C header file.
Posted by Dalibor Topic on April 14, 2009 at 02:11 AM PDT #
I've run into a few cases where it has been necessary to check-in binary files in the test tree. The OpenJDK regression/unit tests need to run on systems that don't have a C compiler and other tools to build the test from source code. For those cases, I've checked in the source, a README or script to build the binaries, and the resulting binaries for the required architectures. Not sure if that breaks any of the these commandments :-)
Posted by Alan on April 14, 2009 at 08:52 AM PDT #
To 1: I also followed this rule - until I checked out some quite old code from a repo and wondered taht it didn't compile at all. The reason: the API of the JAR I used in that (long ago) times changed dramatically -- and I didn't know which version I was using at this time and IF I could even get such an old jar again.
Refactoring the old code to the new API definately took some time!
That was the time where I began to put JARS into the repo. Not very nice, indeed. But it prevents such issues.
Posted by justmy2c on April 14, 2009 at 11:37 PM PDT #
For rule #1; I agree completely if those jars are buildable by source that lives in your repository. However when it comes to third party jars they absolutely should be in your repository. There is no reason that developers should need to go locate/download multiple third party packages. We have even gone so far as putting our entire Jboss server into our repository. This way when a new developer comes on board they can simply check out everything from the repository and start developing. There is no need for them to go install jboss, hunt down log4j, figure out which version of groovy our code is using, etc.
Posted by Jason on April 15, 2009 at 08:06 AM PDT #
Uhm, Jason - take a look at dependency managers (Ivy, Maven, few others).
I could agree about keeping a preconfigured application server or similar, but I really prefer to throw resources like this onto shared resource (Samba etc).
Posted by 89.31.64.8 on April 17, 2009 at 01:22 AM PDT #