Kelly O'Hair's Weblog (blogs.sun.com)
Thursday Oct 08, 2009
Teamware/SCCS history conversion to Mercurial
Teamware/SCCS history conversion to Mercurial
Originally posted back in December 2007, I've added some new references and some possible strategies, at the end.
Silver Falls, Oregon. No, it doesn't use Mercurial, yet.
Just a few notes on converting source file change history from Teamware/SCCS to Mercurial. These are just notes because in the JDK area and in any Teamware JDK workspaces being converted, we don't plan on converting the old source change history into Mercurial. The major reason why we aren't is a legal issue, and you can imagine what the legal issues are with regards to non-open sources that become open. I won't get into that. But there are some technical issues too, which I will try and cover in case someone decides to attempt such a conversion.
Why convert the revision history?
The complete source history is an extremely valuable asset, being able to know when and who made a change years ago is often essential to understanding a problem in a product. Initially we wanted to preserve this source change history and assumed it wasn't a difficult job. Most engineers have been upset that our current plans don't include this history conversion, but read on if you are curious as to the problems encountered.
The Basic Idea
The basic idea in doing an 'ideal' source history conversion would be to create a Mercurial changeset for each Teamware putback. That means you need to identify the putback event, the specific SCCS revisions of the files, and any file renames or deletes. And each changeset is built upon the previous changeset, so the ordering of the changesets is critical here.
Sounds simple right? Well, read on, it's not so simple.
The Problems
History Files: You need to understand how the Teamware history file works. The Codemgr_wsdata/history file in a workspace does not propagate, so the specifics on a putback won't percolate around your tree of workspaces. This means that each workspace has a history of the Teamware events that happened to it, but not the details of anything that happened to the other workspaces. So to get accurate Teamware history you need the entire tree of integration workspaces (any workspaces that might be the target of a putback) and all that ever existed, then you'd need to fold all the events in these history files together in the proper time order. So the more complicated the Teamware workspace integration tree, the more difficult this task becomes. The JDK workspaces (there are many different workspaces) each have 6-20 different integration workspace instances, and some of these workspaces go back quite a few years, so we are talking some major source change history here.
SCCS file revision numbers: The details in the Teamware history will just list the files involved in a putback or bringover, not the specific SCCS Rev numbers for the files. So matching up the specific SCCS Revs on files to the specific putback event that putback these SCCS revisions is not trivial. (I think there may have been an option in Teamware to record the SCCS revision numbers in the workspace history file, but it is off by default, which is shame). So to create a nice neat Mercurial changeset means you need to somehow match up the filelists and timestamps of the putbacks with the individual SCCS revision numbers of source files. Unfortunately, the SCCS files record a time but no timezone, so if anyone decides to do this kind of history conversion will need to have lots of fuzzy timestamp logic to match up the right SCCS revisions with the putbacks. The username is included in the Teamware history file and the SCCS revisions, so that may also help, except that often an integrator of changes isn't the same person that did the SCCS revision.
SCCS Revision Tree: The SCCS revision tree for each file can be fairly complex graph, depending on how many file merges happened to the file. You might be able to just use the top level SCCS revision number, but information in the SCCS comments of the other revisions will contain important information to preserve.
Deleted files: Teamware deletes files by moving them from where they are to a top level deleted_files directory. So they don't really get deleted, just renamed. However, a common practice with many teams is to purge the deleted_files directory once a product reaches a major milestone. So some of the files may actually be permanently gone, and this needs to be taken into consideration. At some point, you can't recreate the complete source tree if this has happened.
Empty comments: Empty SCCS revision comments, and empty putback comments would also create problems if you planned on using these comments or cookies of information in these comments to connect up the files to the putback events (e.g. bug id numbers or change request numbers). So more specific SCCS revision comments and more specific putback comments might make this job easier.
Approaches
We considered multiple approaches to doing a source revision history conversion. You could come at it from the putback events, using the history files to identify 'real' changesets, and hope any deleted files are still around. What you'd use as Revision 1 of the files might be a little tricky. Or you could try and just look at the SCCS revisions, and figure out via timestamps, usernames, and perhaps SCCS comments, which files were changed as a group. Or a combination of both. Or you could try to come at if from a time perspective, e.g. all the changes to get you from April 1, 2004 to May 1, 2004.
The simple approach of one changeset per SCCS revision isn't really that simple because Mercurial changesets have an order to them. To do it right you'd need to view the Teamware workspace as a large graph of file nodes, with small sub-graphs of SCCS revisions. Then pick a time T to start Revision 1 of the Mercurial sources, find all the file instances at time T, add these files as a changeset to Mercurial, then repeat that for T+1. Or perhaps T+N where N is selected based on sampling timestamps after T for a quiet time (to avoid picking a time that might split up file changes that happened in a group). Just some wild ideas.
But it just feels wrong, no putback data, the files won't be bunched right, and the resulting repository would contain inaccurate source state in any of these converted changesets.
We never fully explored all the approaches because once the legal constraint came in, there seemed no need to pursue it. It's an interesting and complicated problem, but ultimately one we decided we didn't have to solve.
Conclusion
So the bottom line is that whatever can be created would likely have questionable data if someone asked to have the sources per a particular date or if they wanted to know the state of the entire source tree when a given change was made... Hard to ever be perfect here, and not being perfect could send a few engineers down some deep rabbit holes. :^(
The old history isn't being destroyed, it's just being left in the
old Teamware workspaces. So we will still have access to it, just not
via Mercurial repositories.
As time passes, we'll build up new and better history in our Mercurial
repositories, and maybe by the time I retire, it won't matter much. ;^)
Update: Some Ideas
Jesse's conversion script turns out to be a possibility. He documents the problems with it, but it's certainly a step toward something.
With the OpenJDK6 repositories which were originally in TeamWare, we had two ways to gain some history. With each build promotion while in TeamWare, we saved a source bundle, so we had a raw snapshot of the source for each build. By using these as potential working set files, this allowed us to start rev0 with Build 0 source bundles, then for each build promoted after that, repeat the steps:
- Delete the working set files
- Copy in new working set files from the source bundle for Build promotion N.
- Run: hg addremove ; hg commit --message "Build Promotion N" ; hg tag BuildN
Anyway, just thought I would update this rather old posting.
-kto
Posted at 04:09PM Oct 08, 2009 by kto in General | Comments[3]
Friday Oct 02, 2009
JavaFX Survey
Are you a JavaFX user? If so, please take a short JavaFX survey and tell us what you think.
-kto
Posted at 01:29PM Oct 02, 2009 by kto in Java | Comments[1]
Wednesday Sep 30, 2009
Parfait: Finding more bugs
The
Parfait
tool checks C/C++ source code for common systems and security bugs.
The
OpenSolaris
project is using it successfully, and we are currently investigating it's use
with the
OpenJDK project.
Dr. Cristina Cifuentes
has more information on
"Building a Better Bug-Checker".
More publications are available at the
Sun Labs site.
If we could only get rid of all compiler warning errors, findbugs errors, and Parfait errors, the world would be squeaky clean. ;^)
-kto
Posted at 11:25AM Sep 30, 2009 by kto in Java | Comments[1]
Friday Aug 14, 2009
Warning Hunt: Hudson and Charts on FindBugs, PMD, Warnings, and Test Results
|
"Drink, ye harpooners! drink and swear, ye men that man the deathful whaleboat's bow -- Death to Moby Dick!"
Once again, my sincere apologies to the Whales and Whale lovers out there, just remember Moby Dick is just a fictional story and I am in no way condoning the slaughter of those innocent and fantastic animals. |
Hudson and Trends
Using Hudson to watch or monitor trends in your projects is a fantastic tool, although my graphs in this example are not that interesting, you can see how this could help in keeping track of recent test, warnings, findbugs, and pmd trends:
This blog should demonstrate how you can reproduce what you see above.
To get Hudson and start it up, do this:
- Download Hudson (hudson.war)
- In a separate shell window, run:
java -jar hudson.war
Congratulations, you now have Hudson running. Now go to http://localhost:8080/ to view your Hudson system. Initially you need to add some Hudson plugins, so you need to:
- Select Manage Hudson
- Select Manage Plugins
- Select Available
- Select the check boxes for the following plugins: Findbugs Plugin, Mercurial Plugin, and PMD Plugin
- Select Install at the bottom of the page.
- Select Restart Now
We need a small Java project to demonstrate the charting, and one that has been setup to create the necessary charting data. You will need Mercurial installed, so if hg version doesn't work, get one of the Mercurial binary packages installed first. I will use this small jmake project from kenai.com. So get yourself a mercurial clone repository with:
hg clone https://kenai.com/hg/jmake~mercurial ${HOME}/jmakeand verify you can build it from the command line first:
cd ${HOME}/jmakeThe ant import will download junit, findbugs, and pmd. Access to junit, findbugs, and pmd could be done in a variety of ways, this import mechanism is just happens to be the way this ant script is setup.
ant import
ant
This jmake build.xml script is already setup to create the necessary xml files that Hudson will chart for junit, findbugs, and pmd. You may want to look at the build.xml file, specifically the lines at the end of the file for how this is done. Your build.xml needs to create the raw data (the xml files) for the charting to work. This build.xml file happens to be a NetBeans configured ant script so it can be used in NetBeans too.
Now let's create a Hudson job to monitor your jmake repository. Go to http://localhost:8080/ and:
- Select New Job
- Name the job, I'll use "jmake".
- Select Build a free-style software project.
- Select OK.
Next we need to configure the job. You should be sitting in the job configuration page now, this is what you need to do:
- Under Source Code Management, select Mercurial and enter the path ${HOME}/jmake.
- Under Build Triggers select Poll SCM and enter */5 * * * * so that the repository is checked every 5 minutes.
- Under Build select Add Build Step and Invoke Ant, add import as the ant target.
- Select Add Build Step again and Invoke Ant, leave the ant target blank.
- Select Publish JUnit test report result and provide the path build/test/results/TESTS-*.xml
- Select Scan for compiler warnings.
- Select Publish PMD analysis report and provide the path build/pmd.xml.
- Select Publish Findbugs analysis report and provide the path build/findbugs.xml
- Finally hit the Save at the bottom of the job configuration page.
Lastly, you need to use the Build Now a few times to build up some build history before the charts show up.
This Hudson job will automatically run when changes are added to the repository.
Hope this helps. Using Hudson for this is a definite plus, everyone should invest some time in setting these kind of systems up.
-kto
Posted at 01:21PM Aug 14, 2009 by kto in Java |










