Kelly O'Hair's Weblog (blogs.sun.com)

pageicon Monday Jan 11, 2010

Whack a Mole Testing

Of late I seem to have entered a Twilight Zone game of Whack a Mole with the jdk tests. It appears that the odds that a test can fail on some particular OS or machine, with or without any jdk change is higher than I ever thought possible. Very frustrating. Why is that? I have a list of possible contributing factors:

  • Some tests are just downright unpredictable and need to be fixed.
  • Minor differences in the environment variable settings can cause failures.
  • Minor differences in the machine or OS configurations can cause failures.
  • Some tests are rarely run on all platform (OS/ARCH) combinations.
  • The ramification of any jdk change is often poorly understood, and all jdk tests, on all platforms are rarely run.
  • The use of jtreg -samevm can influence the stability of a testrun, if any test is not 'samevm' safe.
  • Using the same machine and/or same user to run multiple sets of tests can also influence the stability of a testrun, if any test is using shared resources (like port numbers or shared directories like /tmp).

Maybe I just have too much Fuzz (unpredictable environment) in the test runs? I can certainly nail down many of the above items to increase stability, but until I have some kind of Hudson or continuous build/test system watching every changeset, this will be a difficult task. And to be effective, it would need to build and test all possible platforms, a platform list that seems to be growing lately.

I found this cartoon from the health care industry (that's scary), anyway I thought was appropriate for product releases too:

If I knew how to draw cartoons, I'd draw one for the testing matrix.

-kto

pageicon Thursday Dec 10, 2009

Mercurial Forest: Pet Shell Trick of the Day

Now be careful with this, but here is a simple bash shell script that will do Mercurial hg commands on a forest pretty quickly. It assumes that the forests are no deeper than 3, e.g. */*/*/.hg. Every hg command is run in parallel shell processes, so very large forests might be dangerous, use at your own risk:

#!/bin/sh

# Shell script for a fast parallel forest command

tmp=/tmp/forest.$$
rm -f -r ${tmp}
mkdir -p ${tmp}

# Remove tmp area on A. B. Normal termination
trap 'rm -f -r ${tmp}' EXIT

# Only look in specific locations for possible forests (avoids long searches)
hgdirs=`ls -d ./.hg ./*/.hg ./*/*/.hg ./*/*/*/.hg 2>/dev/null`

# Derive repository names from the .hg directory locations
repos=""
for i in ${hgdirs} ; do
  repos="${repos} `echo ${i} | sed -e 's@/.hg$@@'`"
done

# Echo out what repositories we will process
echo "# Repos: ${repos}"

# Run the supplied command on all repos in parallel, save output until end
n=0
for i in ${repos} ; do
  n=`expr ${n} '+' 1`
  (
    cline="hg --repository ${i} $*"
    echo "# ${cline}"
    eval "${cline}"
    echo "# exit code $?"
  ) > ${tmp}/repo.${n} ; cat ${tmp}/repo.${n} &
done

# Wait for all hg commands to complete
wait

# Cleanup
rm -f -r ${tmp}

# Terminate with exit 0 all the time (hard to know when to say "failed")
exit 0

Run it like: hgforest status, or hgforest pull -u.

As always should any member of your IMF forest be caught or killed, the secretary will disavow all knowledge of your actions. This tape will self-destruct in five seconds. Good luck Jim.

-kto

pageicon Monday Dec 07, 2009

Have I become a Fuzz Tester?

Everything was a little fuzzy in Livermore, California this morning:

A very rare snowfall for this area, we are at 500 feet above sea level, snow was falling as low as 200 feet early this morning. But that's just my "Fuzz" picture intro...

I read an article on Secure Software Testing and it talked about Fuzz Testing, and now I'm wondering if I've become a Fuzz Tester, and what exactly does that mean? :( Should I see a doctor? "Fuzz", what a funny word, back in the 70's that was one of the more polite slang words for the Police.

Years ago, when I worked on C compilers, one of the first things my fellow Fortran compiler developer did to test my C compiler was to feed it a Fortran source file as input. If my compiler core dumped instead of generating an error message, it would have failed the "garbage input" test. I consequently would cd into his home directory and run "f77 *", but I could never get his damn Fortran compiler to crash, maybe it just liked eating garbage. ;^) Anyway, it appears this kind of "garbage input" testing is a form of "Fuzz Testing", feed your app garbage input and see if you can get it to misbehave.

So back to my primary topic at hand, OpenJDK testing. Lately I've been trying to get to a point where running the jdk tests is predictable. I've done that by weeding out tests that are known failures or unpredictable, and running and re-running the same tests over and over and over, on different systems and with slightly different configurations (-server, -client, -d64).

After reading that Fuzz article, I've come to the conclusion that I've been doing a type of Fuzz Testing, by varying one of the inputs in random ways, the system itself. But that's a silly thought, that would make us all Fuzz Testers, because who runs tests with the exact same system state every time? Unless you saved a private virtual machine image and restarted the system every time, how could you ever be sure the system state was always the same? And even then, depending on how fast the various system services run, even a freshly started virtual machine could have a different state if you started a test 5 seconds later than the last time. And what about the network? There is no way to do that, well I'm sure someone will post a comment telling me how you can do that. And even if you could, what would be the point? If everything is exactly the same, of course it will do the same thing, right? So there is always Fuzz, and you always want some Fuzz, who really wants to be completely Fuzz-less? My brain is now getting Fuzzy. :^(

When looking at the OpenJDK testing, I just want to be able to run a set of robust tests, tests that are immune to a little "system fuzz". In particular system fuzz created by the test itself or it's related tests, "self induced fuzz failures" seems to be a common problem. Sounds like some kind of disease, H1Fuzz1, keep that hand sanitizer ready. Is it reasonable to expect tests to be free of "self induced fuzz failures"? Or should testing guarantee a fuzz free environment and only run a test one at time?

I'm determined to avoid tests with H1Fuzz1, or find a cure. So I recently pushed some changes into the jdk7/tl forest jdk repository:

http://hg.openjdk.java.net/jdk7/tl/jdk/rev/af9346401220

With more improvements on the way, and the same basic change is planned for OpenJDK6. This should make it easier to validate your jdk build by running only the jdk regression/unit tests in the repository that are free of H1Fuzz1. They also should run as quickly as possible. You will need a built jdk7 image and also the jtreg tool installed. To download and install the latest jtreg tool do the following:

wget http://www.java.net/download/openjdk/jtreg/promoted/b03/jtreg-4_0-bin-b03-31_mar_2009.zip
unzip jtreg-4_0-bin-b03-31_mar_2009.zip
export JT_HOME=`pwd`/jtreg

Build the complete jdk forest, or just the jdk repository:

gmake
-OR-
cd jdk/make && gmake ALT_JDK_IMPORT_PATH=previously_built_jdk7_home all images

Then run all the H1Fuzz1-free tests:

cd jdk/test && gmake -k jdk_all [PRODUCT_HOME=jdk7_home]

There are various batches of tests, jdk_all runs all of them and if your machine can handle it, use gnumake -j 4 jdk_all to run up to 4 of the batches in parallel. Some batches are run with jtreg -samevm for faster results. The tests in the ProblemList.txt file are not run, and hopefully efforts are underway to reduce the size of this list by curing H1Fuzz1.

As to the accuracy of the ProblemList.txt file, it could be wrong, and I may have slandered some perfectly good tests by accusing them of having H1Fuzz1. My apologies in advance, let me know if any perfectly passing tests made the list and I will correct it. It is also very possible that I missed some tests, so if you run into tests that you suspect might have H1Fuzz1, we can add them to the list. On the other hand, curing H1Fuzz1 is a better answer. ;^)

That's enough Fuzz Buzz on H1Fuzz1.

-kto