Alan Burlison's Work Related Ramblings

All | General | Java | NetBeans | Perl | Solaris
« Previous month (Jan 2006) | Main | Next month (Mar 2006) »

20060325 Saturday March 25, 2006

Why I hate XML, and Ant's use of it in particular

I was googling around for examples of how to sign a JAR file in Ant using the signjar task and came across this monstrosity - 217 lines of turgid XML. How anyone can say Ant is an improvement over make is a mystery to me, and how anyone could ever think that XML is fit for consumption by humans is an even bigger mystery. XML is food for machines, not people. I sincerely hope that the current fad for doing everything in XML (and we have some examples close to home) is going to an early grave.

What I'm actually trying to do is to deploy an application via WebStart from within NetBeans - I've loaded up the NetBeans WebStart module but it really doesn't cut the mustard yet, it doesn't support JAR signing, it hard-codes the name of the JNLP file and it assumes you want a WAR file building. Here's hoping they make the necessary improvements in short order.

 

Posted by alanbur ( Mar 25 2006, 04:26:49 PM GMT ) Permalink Comments [1]

20060324 Friday March 24, 2006

Language wars are so boring

I've just read James Gosling's post on how he got flamed for daring to say that scripting languages aren't a panacea. Whereas the old language crusades used to be fought between the various "compiled languages" it seems the new jihad is between "compiled" and "dynamic" languages. The arguments are much the same, and are just as tedious as they rely on religious fervour rather than pragmatism and common sense.

As James points out there's a huge continuum of both problem spaces and languages that can be used to address them, the trick is to pick the right tool for the job at hand. The point that most of the participants in language wars miss is that in most cases the choice of language should be secondary, the most important deciding factor should be what tools and libraries are available for the language and how closely they address the problem domain. Syntax just isn't that important - I care far less about having to declare types for my variables than I do about having to write a thousands of lines of code that could more properly be replaced with a library function. I also quite like having the dangerous and tedious bits of programming taken care of, such as memory management - an area where dynamic languages and garbage collecting languages such as Java have a clear edge over more traditional choices such as C - however in some circumstances C is still the right choice, when other factors (e.g. predictability or outright speed) are more important.

The day job for the last 18 months has been nearly 100% Perl and heavily regexp based as well as requiring access to MySQL, LDAP and a load of other network-based stuff. At one point I rewrote some of the Perl regexp code in Java. It was easy to do, thanks to Java's good regexp library. Sure it was a bit more wordy, but it worked just fine. That's the point - the power of both Perl an Java isn't really the core language, it's the ecosystem in which they exist - in Perl's case the unspoken hero is CPAN, in Java's case it's the extensive class library that it comes with, along with the vast range of freely available external class libraries. Choosing a language based purely on whether it is "static" or "dynamic" makes about as much sense as choosing a car based on the paint colour - sure it's important, but it's not the most important factor.

Posted by alanbur ( Mar 24 2006, 11:03:59 AM GMT ) Permalink Comments [0]

20060322 Wednesday March 22, 2006

A TWiki rant

As part of the day job I've had to upgrade a TWiki installation to the latest version. This job always fills me with foreboding, as every time I've done it in the past something new and unexpected seems to break. I was upgrading this time in order to fix the security holes in the version of TWiki I was using, all of which were caused by known "Don't do that, it's a security hole!" type code. 4.0.1 is a pretty significant upgrade, a lot of stuff has been changed, and unfortunately some of it is less than fully functional.

The first bit of brokenness was mod_perl related. TWiki runs like a tranquillised slug at the best of times, so it's usually necessary to run it under mod_perl to get reasonable performance. I set up the new version under mod_perl, but kept getting random failures where perl was complaining that it couldn't find the TWiki modules. Dumping out @INC revealed that when this happened the TWiki library path was missing from @INC. A further amount of digging revealed that for some reason I can't quite fathom the TWiki developers decided it would be a good idea to set @INC with the block of code below:

BEGIN {
# Set default current working directory (needed for mod_perl)
if( $ENV{"SCRIPT_FILENAME"} && $ENV{"SCRIPT_FILENAME"} =~ /^(.+)\/[^\/]+$/ ) {
chdir $1;
}
# Set library paths in @INC, at compile time
unshift @INC, '.';
require 'setlib.cfg';
}

(for the non-perl-savvy, @INC is the search path perl uses to find modules). There's only one problem with this, as a little reading of the mod_perl documentation reveals:

When running under mod_perl, once the server is up @INC is frozen and cannot be updated. The only opportunity to temporarily modify @INC is while the script or the module are loaded and compiled for the first time. After that its value is reset to the original one. The only way to change @INC permanently is to modify it at Apache startup.

BEGIN blocks are run as the perl interpreter starts up, so the change to @INC is only there when the script is first compiled. TWiki likes to pull in additional modules on the post-compile-time using various combinations of require, do and eval, mainly to try to speed up its poor performance. However by that time the modules are pulled in the changes made to @INC in the BEGIN block are long gone, and bad things happen. Having each TWiki script change the working directory just so the subsequent require will work seems pretty unfriendly, especially when under mod_perl the process may be reused by a completely different script, and the unshift @INC, '.'; is completely unnecessary as . is already included in perl's default @INC. Cleaning up this mess needed two things - first I went through all the twiki scripts and ripped out all the BEGIN blocks and replaced them with the appropriate use statements. That fixed the case when the scripts were run as vanilla CGI scripts. The second problem was the require statements that were scattered throughout the TWiki modules. I didn't want to go through the code fixing them all, not least because it made future upgrades even more difficult, so I took the distasteful step of wedging the TWiki library paths permanently into @INC when running under mod_perl. This meant it had to be done in the Apache startup file, as explained in the section from the mod_perl documentation above. I added the requisite paths onto the end of @INC rather than putting at the beginning with use lib as I want any upgraded modules I install to override the ones TWiki bundles - the setting of @INC is shared by all the code that runs under mod_perl, which is why TWiki's propensity for using require without fully specifying the paths where the modules live makes it such an bad mod_perl citizen.

What's ironic is both the release notes and the comment in the code above state how mod_perl support has been "improved" in this release (actually, it's worse). I even went on to #twiki on IRC and asked why they didn't just provide an install script to edit the TWiki scripts and put in the proper use and require statements, but I was told that the lead developer "Doesn't believe in install scripts". Hmm...

Today I fell into yet another TWiki bear trap. I copied the pages from my existing install into the new TWiki and checked that I could edit and save pages. Nope, save attempts failed with a message saying that RCS had failed and I should contact my TWiki administrator. As I was the aforementioned administrator and I had no clue what was going wrong the message was less than useful. What I couldn't understand was why I wasn't getting any diagnostic messages, either back to the browser or to the server log files. Spelunking through the source revealed this little coding gem:

        # Note that there doesn't seem to be any way to redirect
# STDERR when using safe pipes.

my $pid = open($handle, '-|');

throw Error::Simple( 'open of pipe failed: '.$! ) unless defined $pid;

if ( $pid ) {
# Parent - read data from process filehandle
local $/ = undef; # set to read to EOF
$data = <$handle>;
close $handle;
$exit = ( $? >> 8 );
} else {
# Child - run the command
open (STDERR, '>'.File::Spec->devnull()) || die "Oh dear";
exec( $path, @args ) ||
throw Error::Simple( 'exec failed: '.$! );
# can never get here
}

So, let's throw all the diagnostic output from all the commands we ever invoke down /dev/null because we (presumably?) haven't read the perl documentation for the open() function that tells us how to do this. Hell, they could have left STDERR alone and the output would have ended up in the server logs, or it could have been redirected to the TWiki log file. It's not as if it is even particularly difficult, all that's needed is:

            open (STDERR, '>&', \*STDOUT) || die $!;

Once I'd fixed that the cause of my edit problems became clear - RCS was complaining that the file was already locked. Huh? From comparing the RCS files between my old and new TWiki installs it appears that they've completely changed the way edit locking is done in the new release. Previously files were locked when they weren't being edited (yes, really), now they are locked when they are being edited, so of course when you copy existing TWiki content into a new install all the locks are the wrong way around and you can't save anything. I looked to see if I could find any mention of this change in locking behaviour in the release notes or on the TWiki site and came up blank, and it appears other people have hit the same issue as well. I find it difficult to believe that they'd make such a major change to the way TWiki works without bothering to mention it, but on the evidence at hand I can only assume that's what has actually happened.

I then had a look at how $path (the thing they were exec-ing above) was being set. The responsible subroutine (sysCommand()) is passed a template argument holding the command and arguments that are to be executed. The template argument is then split up like this:

    $template =~ /(^.*?)\s+(.*)$/;
my $path = $1;
my $pTmpl = $2;

# Build argument list from template
my @args = $this->_buildCommandLine( $pTmpl, %params );

The whole reason the new "Sandbox" class was added to TWiki was to try to plug the gaping security voids in TWiki's handling of external commands, but despite lengthy discussion on the TWiki website about how to fix this, the new version is nearly as broken as the version they had before - in fact it's a CERT alert just waiting to happen. Consider what whould happen if we could fool TWiki into injecting a string of the following form into a call to sysCommand():

"IFS=!;cd!/!&&/bin/rm!-rf!*# "

Note there is one whitespace character at the end of the string. We would end up with $path='IFS=!;cd!/!&&/bin/rm!-rf!*#' and @args would be the empty array. That means we'd be passing a single argument ($path) into perl's exec() function. The manpage for exec() says:

If there is only one scalar argument or an array with one element in it, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is "/bin/sh -c" on Unix platforms, but varies on other platforms)

So we'd pass the whole string to /bin/sh. The assignment to IFS tells the shell to treat "!" as its whitespace character, so the string passed to the shell ends up being treated as as cd / && /bin/rm -rf *. I'm sure you can figure out what happens next. A cursory examination hasn't revealed any obvious way of crafting an exploit, but the code above really is a disaster waiting to happen. You just can't pass around command lines as strings, split them up and use them safely. The only safe course of action is to always explicitly and fully specify the path of the command you are executing.

The other thing you notice when looking through the TWiki source is lots of comments like this:

# SMELL: This is horrible. HORRIBLE!!

# SMELL: security hazard?

# SMELL: WTF is this??? This looks like a really bad hack!

In fact it turns out there's a bad smell every 110 lines on average - and that's just the ones they know about, my guess is you could easily double this based on how many potential issues I've spotted with just a cursory inspection of the code. That is worrying, I really don't understand how they can contemplate releasing something that is so obviously flaky.

Now before all you Open Source zealots start scrolling for the comment box and telling me to 'Fix it yourself if it is so broken', sorry - no. If there is a bug in a Sun, Oracle, Microsoft or any other commercial product users quite rightly expect those responsible for the software to fix it. The Open Source community keeps telling everyone how high quality Open Source software is and how it should be treated the same as commercial offerings. Fine, in that case I get to expect that the people who are involved in the TWiki project should clean up their problems, not me.

The other thing that this tale shows the flip side of is the oft-stated aphorism "Many eyes make all bugs shallow". That may or may not be true, but there are far too many of them in TWiki for my comfort. Most of the issues I've seen do not require Grand Wizard level perl knowledge, just the ability to read existing documentation and think about what you are doing - no clever tricks required. If I was being unkind (and after the last couple of days working with TWiki I'm inclined to be) I'd say that the TWiki code shows all the signs that it has been written by people who know just enough perl to get into trouble and just too little perl to know how to get out of it. And I don't think this points to any deficiency in perl either, like most power tools it's incredibly versatile in the hands of someone who has learned how to use it properly, but in the hands of those who don't read the instructions and take the training it's a quick road to casualty. That leads me neatly on to "Alan's Axiom":

Many eyes may conceivably make all bugs shallow, but use of many inadequately skilled hands will make sure there are a hell of a lot of them to look at.

To sum up, my advice to anyone contemplating deploying TWiki is to think again - my experience is that it's slow, fragile, hard to install, difficult to maintain, insecure and full of dubious coding practices. Certainly my current TWiki installation will be my last. This is of course solely my personal opinion ;-)

Posted by alanbur ( Mar 22 2006, 09:29:05 PM GMT ) Permalink Comments [12]