Alan Burlison's Work Related Ramblings

All | General | Java | NetBeans | Perl | Solaris

20060405 Wednesday April 05, 2006

A sort of perl golf challenge

I've been reviewing some perl code code that is due to go back into Solaris shortly, and one of the routines takes a sorted array of integers and returns a string with contiguous ranges of numbers collapsed and other numbers comma-separated. For example, given an array containing 1, 3, 4, 5, 8, 9, 10, 12 the routine would return the string "1, 3-5, 8-10, 12". The routine iterates over the array looking for and collapsing sequences of numbers. It's 30 lines long, but always being one for taking up a pointless challenge I wondered if I could make it any shorter. Here's what I came up with:

sub collapse
{
        my $str = join(', ', @_);
        while ($str =~ s{\b(\d+), ((??{ $1 + 1 }))\b}{$1-$2}g) {}
        $str =~ s{-(?:\d+-)+}{-}g;
        return ($str);
} 

Now I know that under the proper perl golf rules I could shorten that down by removing whitespace, using implicit assignment and matches against $_ and so forth but I'm more interested in seeing if anyone can come up with a conceptually shorter solution (i.e. one that I can still read ;-). It occured to me that you might be able to do something smart with a recursive regexp, but the requirement to use $1 + 1 to spot a sequence kept stymieing me. However I'm sure some other perl saddo out there will come up with something even shorter and far smarter. Anyone? ;-)

Posted by alanbur ( Apr 05 2006, 10:20:31 PM BST ) Permalink Comments [9]

20060324 Friday March 24, 2006

Language wars are so boring

I've just read James Gosling's post on how he got flamed for daring to say that scripting languages aren't a panacea. Whereas the old language crusades used to be fought between the various "compiled languages" it seems the new jihad is between "compiled" and "dynamic" languages. The arguments are much the same, and are just as tedious as they rely on religious fervour rather than pragmatism and common sense.

As James points out there's a huge continuum of both problem spaces and languages that can be used to address them, the trick is to pick the right tool for the job at hand. The point that most of the participants in language wars miss is that in most cases the choice of language should be secondary, the most important deciding factor should be what tools and libraries are available for the language and how closely they address the problem domain. Syntax just isn't that important - I care far less about having to declare types for my variables than I do about having to write a thousands of lines of code that could more properly be replaced with a library function. I also quite like having the dangerous and tedious bits of programming taken care of, such as memory management - an area where dynamic languages and garbage collecting languages such as Java have a clear edge over more traditional choices such as C - however in some circumstances C is still the right choice, when other factors (e.g. predictability or outright speed) are more important.

The day job for the last 18 months has been nearly 100% Perl and heavily regexp based as well as requiring access to MySQL, LDAP and a load of other network-based stuff. At one point I rewrote some of the Perl regexp code in Java. It was easy to do, thanks to Java's good regexp library. Sure it was a bit more wordy, but it worked just fine. That's the point - the power of both Perl an Java isn't really the core language, it's the ecosystem in which they exist - in Perl's case the unspoken hero is CPAN, in Java's case it's the extensive class library that it comes with, along with the vast range of freely available external class libraries. Choosing a language based purely on whether it is "static" or "dynamic" makes about as much sense as choosing a car based on the paint colour - sure it's important, but it's not the most important factor.

Posted by alanbur ( Mar 24 2006, 11:03:59 AM GMT ) Permalink Comments [0]

20060322 Wednesday March 22, 2006

A TWiki rant

As part of the day job I've had to upgrade a TWiki installation to the latest version. This job always fills me with foreboding, as every time I've done it in the past something new and unexpected seems to break. I was upgrading this time in order to fix the security holes in the version of TWiki I was using, all of which were caused by known "Don't do that, it's a security hole!" type code. 4.0.1 is a pretty significant upgrade, a lot of stuff has been changed, and unfortunately some of it is less than fully functional.

The first bit of brokenness was mod_perl related. TWiki runs like a tranquillised slug at the best of times, so it's usually necessary to run it under mod_perl to get reasonable performance. I set up the new version under mod_perl, but kept getting random failures where perl was complaining that it couldn't find the TWiki modules. Dumping out @INC revealed that when this happened the TWiki library path was missing from @INC. A further amount of digging revealed that for some reason I can't quite fathom the TWiki developers decided it would be a good idea to set @INC with the block of code below:

BEGIN {
# Set default current working directory (needed for mod_perl)
if( $ENV{"SCRIPT_FILENAME"} && $ENV{"SCRIPT_FILENAME"} =~ /^(.+)\/[^\/]+$/ ) {
chdir $1;
}
# Set library paths in @INC, at compile time
unshift @INC, '.';
require 'setlib.cfg';
}

(for the non-perl-savvy, @INC is the search path perl uses to find modules). There's only one problem with this, as a little reading of the mod_perl documentation reveals:

When running under mod_perl, once the server is up @INC is frozen and cannot be updated. The only opportunity to temporarily modify @INC is while the script or the module are loaded and compiled for the first time. After that its value is reset to the original one. The only way to change @INC permanently is to modify it at Apache startup.

BEGIN blocks are run as the perl interpreter starts up, so the change to @INC is only there when the script is first compiled. TWiki likes to pull in additional modules on the post-compile-time using various combinations of require, do and eval, mainly to try to speed up its poor performance. However by that time the modules are pulled in the changes made to @INC in the BEGIN block are long gone, and bad things happen. Having each TWiki script change the working directory just so the subsequent require will work seems pretty unfriendly, especially when under mod_perl the process may be reused by a completely different script, and the unshift @INC, '.'; is completely unnecessary as . is already included in perl's default @INC. Cleaning up this mess needed two things - first I went through all the twiki scripts and ripped out all the BEGIN blocks and replaced them with the appropriate use statements. That fixed the case when the scripts were run as vanilla CGI scripts. The second problem was the require statements that were scattered throughout the TWiki modules. I didn't want to go through the code fixing them all, not least because it made future upgrades even more difficult, so I took the distasteful step of wedging the TWiki library paths permanently into @INC when running under mod_perl. This meant it had to be done in the Apache startup file, as explained in the section from the mod_perl documentation above. I added the requisite paths onto the end of @INC rather than putting at the beginning with use lib as I want any upgraded modules I install to override the ones TWiki bundles - the setting of @INC is shared by all the code that runs under mod_perl, which is why TWiki's propensity for using require without fully specifying the paths where the modules live makes it such an bad mod_perl citizen.

What's ironic is both the release notes and the comment in the code above state how mod_perl support has been "improved" in this release (actually, it's worse). I even went on to #twiki on IRC and asked why they didn't just provide an install script to edit the TWiki scripts and put in the proper use and require statements, but I was told that the lead developer "Doesn't believe in install scripts". Hmm...

Today I fell into yet another TWiki bear trap. I copied the pages from my existing install into the new TWiki and checked that I could edit and save pages. Nope, save attempts failed with a message saying that RCS had failed and I should contact my TWiki administrator. As I was the aforementioned administrator and I had no clue what was going wrong the message was less than useful. What I couldn't understand was why I wasn't getting any diagnostic messages, either back to the browser or to the server log files. Spelunking through the source revealed this little coding gem:

        # Note that there doesn't seem to be any way to redirect
# STDERR when using safe pipes.

my $pid = open($handle, '-|');

throw Error::Simple( 'open of pipe failed: '.$! ) unless defined $pid;

if ( $pid ) {
# Parent - read data from process filehandle
local $/ = undef; # set to read to EOF
$data = <$handle>;
close $handle;
$exit = ( $? >> 8 );
} else {
# Child - run the command
open (STDERR, '>'.File::Spec->devnull()) || die "Oh dear";
exec( $path, @args ) ||
throw Error::Simple( 'exec failed: '.$! );
# can never get here
}

So, let's throw all the diagnostic output from all the commands we ever invoke down /dev/null because we (presumably?) haven't read the perl documentation for the open() function that tells us how to do this. Hell, they could have left STDERR alone and the output would have ended up in the server logs, or it could have been redirected to the TWiki log file. It's not as if it is even particularly difficult, all that's needed is:

            open (STDERR, '>&', \*STDOUT) || die $!;

Once I'd fixed that the cause of my edit problems became clear - RCS was complaining that the file was already locked. Huh? From comparing the RCS files between my old and new TWiki installs it appears that they've completely changed the way edit locking is done in the new release. Previously files were locked when they weren't being edited (yes, really), now they are locked when they are being edited, so of course when you copy existing TWiki content into a new install all the locks are the wrong way around and you can't save anything. I looked to see if I could find any mention of this change in locking behaviour in the release notes or on the TWiki site and came up blank, and it appears other people have hit the same issue as well. I find it difficult to believe that they'd make such a major change to the way TWiki works without bothering to mention it, but on the evidence at hand I can only assume that's what has actually happened.

I then had a look at how $path (the thing they were exec-ing above) was being set. The responsible subroutine (sysCommand()) is passed a template argument holding the command and arguments that are to be executed. The template argument is then split up like this:

    $template =~ /(^.*?)\s+(.*)$/;
my $path = $1;
my $pTmpl = $2;

# Build argument list from template
my @args = $this->_buildCommandLine( $pTmpl, %params );

The whole reason the new "Sandbox" class was added to TWiki was to try to plug the gaping security voids in TWiki's handling of external commands, but despite lengthy discussion on the TWiki website about how to fix this, the new version is nearly as broken as the version they had before - in fact it's a CERT alert just waiting to happen. Consider what whould happen if we could fool TWiki into injecting a string of the following form into a call to sysCommand():

"IFS=!;cd!/!&&/bin/rm!-rf!*# "

Note there is one whitespace character at the end of the string. We would end up with $path='IFS=!;cd!/!&&/bin/rm!-rf!*#' and @args would be the empty array. That means we'd be passing a single argument ($path) into perl's exec() function. The manpage for exec() says:

If there is only one scalar argument or an array with one element in it, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is "/bin/sh -c" on Unix platforms, but varies on other platforms)

So we'd pass the whole string to /bin/sh. The assignment to IFS tells the shell to treat "!" as its whitespace character, so the string passed to the shell ends up being treated as as cd / && /bin/rm -rf *. I'm sure you can figure out what happens next. A cursory examination hasn't revealed any obvious way of crafting an exploit, but the code above really is a disaster waiting to happen. You just can't pass around command lines as strings, split them up and use them safely. The only safe course of action is to always explicitly and fully specify the path of the command you are executing.

The other thing you notice when looking through the TWiki source is lots of comments like this:

# SMELL: This is horrible. HORRIBLE!!

# SMELL: security hazard?

# SMELL: WTF is this??? This looks like a really bad hack!

In fact it turns out there's a bad smell every 110 lines on average - and that's just the ones they know about, my guess is you could easily double this based on how many potential issues I've spotted with just a cursory inspection of the code. That is worrying, I really don't understand how they can contemplate releasing something that is so obviously flaky.

Now before all you Open Source zealots start scrolling for the comment box and telling me to 'Fix it yourself if it is so broken', sorry - no. If there is a bug in a Sun, Oracle, Microsoft or any other commercial product users quite rightly expect those responsible for the software to fix it. The Open Source community keeps telling everyone how high quality Open Source software is and how it should be treated the same as commercial offerings. Fine, in that case I get to expect that the people who are involved in the TWiki project should clean up their problems, not me.

The other thing that this tale shows the flip side of is the oft-stated aphorism "Many eyes make all bugs shallow". That may or may not be true, but there are far too many of them in TWiki for my comfort. Most of the issues I've seen do not require Grand Wizard level perl knowledge, just the ability to read existing documentation and think about what you are doing - no clever tricks required. If I was being unkind (and after the last couple of days working with TWiki I'm inclined to be) I'd say that the TWiki code shows all the signs that it has been written by people who know just enough perl to get into trouble and just too little perl to know how to get out of it. And I don't think this points to any deficiency in perl either, like most power tools it's incredibly versatile in the hands of someone who has learned how to use it properly, but in the hands of those who don't read the instructions and take the training it's a quick road to casualty. That leads me neatly on to "Alan's Axiom":

Many eyes may conceivably make all bugs shallow, but use of many inadequately skilled hands will make sure there are a hell of a lot of them to look at.

To sum up, my advice to anyone contemplating deploying TWiki is to think again - my experience is that it's slow, fragile, hard to install, difficult to maintain, insecure and full of dubious coding practices. Certainly my current TWiki installation will be my last. This is of course solely my personal opinion ;-)

Posted by alanbur ( Mar 22 2006, 09:29:05 PM GMT ) Permalink Comments [12]

20050909 Friday September 09, 2005

DTrace and Perl

After reading Bryan's DTrace and PHP and DTrace and PHP, demonstrated blog entries, I thought I'd better have a crack at doing the same for Perl - after all, anything PHP can do, Perl can do better - right? ;-)

I posted a question on the perl5-porters list about the best way of doing this for Perl, and Dave Mitchell kindly stepped up to help - many thanks to Dave, and he should take most of the credit for figuring out how best to do this.

The PHP DTrace probes instrument the function entry and return points within a PHP script so I thought I'd do the same for Perl. After some to-and-fro-ing with Dave it became apparent that the easiest way was to add DTrace probes to the PUSHSUB and POPSUB macros. As the names suggest, these are called each time a Perl subroutine is entered and exited. This doesn't get all of the potential entry points, for example eval and require but it's a good start, at least for the purposes of prototyping. The first step is to define the DTrace probes that we will be using:

provider perl {
        probe sub__entry(char*, char*, int);
        probe sub__return(char*, char*, int);
};

The arguments are the sub name, the filename and line number. Here's the changes that are required to cop.h for the PUSHSUB case:

        DTRACE_PROBE3(perl, sub__entry, GvENAME(CvGV(cv)),              \
            CopFILE((COP*)CvSTART(cv)), CopLINE((COP*)CvSTART(cv)));

The fiddly bits are the dereferencing of the current cv to extract the sub name, file name and line number. The next bit is to hack Makefile to call dtrace to postprocess the object files before building libperl.so. This builds an extra object file which we then need to include in the file list used to build libperl.so:

$(LIBPERL): $& perl$(OBJ_EXT) $(obj) $(LIBPERLEXPORT)
        dtrace -G -s dtrace.d -o dtrace.o $(obj) miniperlmain.o perl.o
        $(LD) -o $@ $(SHRPLDFLAGS) perl$(OBJ_EXT) $(obj) dtrace.o $(libs)

Job done. There are a couple of caveats however - firstly, adding the probes has a worst-case performance hit of about 5% even when they aren't enabled. The hit depends on what you are doing in your script - the worst case is for a script that does nothing but call an empty sub. The DTrace probes themselves are NOP'd out when they are inactive, but the code used to calculate the sub name, file name and line number and put them on the stack is still executed even when DTrace is inactive. I've discussed this with the DTrace crew and a way to minimise the overhead is to use a DTrace translator - a shim that sits between the application and the DTrace script and gives a stable view of application data strucures. This would take a pointer to the current cv and do the various dereferencing ops to get the sub name, file name and line number only when the probe was actually active rather than all the time, as in the prototype. This is a good idea not just because of the performance improvement, but also because it means that we can provide a stable API that will be immune to changes in the perl interpreter.

The second caveat is that at the moment some tests in the perl test suite fail with the DTrace macros inserted. Most of these happen because the object files that go into some of the extensions aren't postprocessed with DTrace, so if the extension uses the modified PUSHSUB and POPSUB macros it will fail. Two solutions spring to mind - either modify MakeMaker so that module objects are postprocessed with DTrace, or conditionally add the DTrace probes only when compiling libperl. Overall, I think I prefer the second option, at least as a first pass option.

Anyway, without further ado I present a simple example of it in use. The following D script aggregates page faults by the sub that generated them. Page faults that are caused by the perl interpreter when it isn't actually executing perl code (e.g. during startup) are aggregated under PERL

#pragma D option quiet

BEGIN
{
        self->sub = "PERL";
        self->file = "PERL";
}

perl$target:::sub-entry
{
        self->sub = copyinstr(arg0);
        self->file = copyinstr(arg1);
}

perl$target:::sub-return
{
        self->sub = "PERL";
        self->file = "PERL";
}

vminfo:::maj_fault,
vminfo:::zfod,
vminfo:::as_fault
/self->sub != 0/
{
        @pf[self->sub, self->file] = count();
}

END
{
        printa("%@8d  %-20s %s\n", @pf);
}

And here is the output when run on pod2text perl.pod:

       1  import               /demo/lib/Exporter.pm
       1  import               /demo/lib/strict.pm
       1  import               /demo/lib/warnings/register.pm
       1  output               /demo/lib/Pod/Text.pm
       2  GetOptions           /demo/lib/Getopt/Long.pm
       2  bits                 /demo/lib/strict.pm
       2  interpolate          /demo/lib/Pod/Parser.pm
       2  seq_c                /demo/lib/Pod/Text.pm
       2  wrap                 /demo/lib/Pod/Text.pm
       4  as_heavy             /demo/lib/Exporter.pm
       6  verbatim             /demo/lib/Pod/Text.pm
       8  BEGIN                /demo/lib/Pod/Select.pm
       8  BEGIN                /demo/lib/Pod/Usage.pm
      10  BEGIN                /demo/lib/Pod/Text.pm
      11  import               /demo/lib/vars.pm
      13  BEGIN                /demo/lib/Pod/Parser.pm
      18  BEGIN                /demo/lib/vars.pm
      25  BEGIN                pod/pod2text
      28  BEGIN                /demo/lib/Getopt/Long.pm
      51  BEGIN                /demo/lib/warnings.pm
    1580  PERL                 PERL

Of course that's just a really simplistic sample, the full power of DTrace can be used to find out exactly what your perl script is doing sub-by-sub, whether it be in libperl, libc or the kernel itself. Here's an example which shows which perl subs result in kernel locks being taken, the output is for the same pod2text perl.pod command. If you want to know what all the scary numbers mean, the DTrace lockstat provider documentation explains all.

       1 adaptive BEGIN                          /demo/lib/Symbol.pm
       1 adaptive ConfigDefaults                 /demo/lib/Getopt/Long.pm
       1 adaptive DESTROY                        /demo/lib/Config.pm
       1 adaptive FindOption                     /demo/lib/Getopt/Long.pm
       1 adaptive MAIN                           pod/pod2text
       1 adaptive TIEHASH                        /demo/lib/Config.pm
       1 adaptive _pop_input_stream              /demo/lib/Pod/Parser.pm
       1 adaptive _push_input_stream             /demo/lib/Pod/Parser.pm
       1 adaptive begin_input                    /demo/lib/Pod/Parser.pm
       1 adaptive begin_pod                      /demo/lib/Pod/Parser.pm
       1 adaptive cmd_back                       /demo/lib/Pod/Text.pm
       1 adaptive cmd_over                       /demo/lib/Pod/Text.pm
       1 adaptive config                         /demo/lib/Getopt/Long.pm
       1 adaptive end_input                      /demo/lib/Pod/Parser.pm
       1 adaptive end_pod                        /demo/lib/Pod/Parser.pm
       1 adaptive export                         /demo/lib/Exporter.pm
       1 adaptive import                         /demo/lib/Config.pm
       1 adaptive import                         /demo/lib/Getopt/Long.pm
       1 adaptive initialize                     /demo/lib/Pod/Parser.pm
       1 adaptive initialize                     /demo/lib/Pod/Text.pm
       1 adaptive new                            /demo/lib/Pod/Parser.pm
       1 adaptive parse_from_file                /demo/lib/Pod/Parser.pm
       1 adaptive parse_from_filehandle          /demo/lib/Pod/Parser.pm
       1 adaptive was_cutting                    /demo/lib/Pod/InputObjects.pm
       2 adaptive BEGIN                          /demo/lib/Config.pm
       2 adaptive BEGIN                          /demo/lib/Exporter/Heavy.pm
       2 adaptive BEGIN                          /demo/lib/File/Spec.pm
       2 adaptive Configure                      /demo/lib/Getopt/Long.pm
       2 adaptive gensym                         /demo/lib/Symbol.pm
       2 spin     import                         /demo/lib/vars.pm
       2 spin     interpolate                    /demo/lib/Pod/Parser.pm
       2 spin     seq_c                          /demo/lib/Pod/Text.pm
       2 spin     verbatim                       /demo/lib/Pod/Text.pm
       3 adaptive BEGIN                          /demo/lib/File/Spec/Unix.pm
       3 adaptive BEGIN                          /demo/lib/Pod/InputObjects.pm
       3 adaptive BEGIN                          /demo/lib/Pod/ParseLink.pm
       3 spin     __ANON__                       /demo/lib/Pod/Parser.pm
       3 spin     input_file                     /demo/lib/Pod/Parser.pm
       3 spin     parse_text                     /demo/lib/Pod/Parser.pm
       3 spin     preprocess_line                /demo/lib/Pod/Parser.pm
       3 spin     wrap                           /demo/lib/Pod/Text.pm
       4 adaptive BEGIN                          /demo/lib/constant.pm
       4 adaptive mkMask                         /demo/lib/warnings/register.pm
       4 adaptive unimport                       /demo/lib/strict.pm
       6 spin     new                            /demo/lib/Pod/InputObjects.pm
       7 adaptive cmd_head2                      /demo/lib/Pod/Text.pm
       8 adaptive seq_f                          /demo/lib/Pod/Text.pm
       9 adaptive import                         /demo/lib/constant.pm
       9 spin     as_heavy                       /demo/lib/Exporter.pm
      11 adaptive cmd_head1                      /demo/lib/Pod/Text.pm
      12 adaptive ParseOptionSpec                /demo/lib/Getopt/Long.pm
      12 adaptive cmd_item                       /demo/lib/Pod/Text.pm
      12 adaptive item                           /demo/lib/Pod/Text.pm
      12 spin     BEGIN                          /demo/lib/Pod/Select.pm
      15 adaptive seq_i                          /demo/lib/Pod/Text.pm
      18 adaptive heading                        /demo/lib/Pod/Text.pm
      18 adaptive import                         /demo/lib/warnings/register.pm
      18 spin     BEGIN                          /demo/lib/Pod/Usage.pm
      26 spin     BEGIN                          /demo/lib/Pod/Text.pm
      32 adaptive command                        /demo/lib/Pod/Text.pm
      33 adaptive GetOptions                     /demo/lib/Getopt/Long.pm
      34 adaptive seq_c                          /demo/lib/Pod/Text.pm
      35 adaptive left_delimiter                 /demo/lib/Pod/InputObjects.pm
      38 adaptive seq_b                          /demo/lib/Pod/Text.pm
      39 adaptive _infer_text                    /demo/lib/Pod/ParseLink.pm
      39 adaptive _parse_section                 /demo/lib/Pod/ParseLink.pm
      39 adaptive file_line                      /demo/lib/Pod/InputObjects.pm
      39 adaptive parselink                      /demo/lib/Pod/ParseLink.pm
      39 adaptive seq_l                          /demo/lib/Pod/Text.pm
      42 spin     BEGIN                          /demo/lib/Getopt/Long.pm
      42 spin     BEGIN                          /demo/lib/Pod/Parser.pm
      42 spin     BEGIN                          pod/pod2text
      57 adaptive reformat                       /demo/lib/Pod/Text.pm
      57 adaptive textblock                      /demo/lib/Pod/Text.pm
      92 adaptive wrap                           /demo/lib/Pod/Text.pm
      96 spin     BEGIN                          /demo/lib/vars.pm
      99 adaptive import                         /demo/lib/Exporter.pm
      99 adaptive verbatim                       /demo/lib/Pod/Text.pm
     100 adaptive import                         /demo/lib/strict.pm
     104 adaptive output_handle                  /demo/lib/Pod/Parser.pm
     111 adaptive as_heavy                       /demo/lib/Exporter.pm
     113 adaptive interior_sequence              /demo/lib/Pod/Text.pm
     113 adaptive parse_tree                     /demo/lib/Pod/InputObjects.pm
     114 adaptive __ANON__                       /demo/lib/Pod/Parser.pm
     119 adaptive cutting                        /demo/lib/Pod/Parser.pm
     119 adaptive parse_paragraph                /demo/lib/Pod/Parser.pm
     119 adaptive preprocess_paragraph           /demo/lib/Pod/Text.pm
     120 adaptive output                         /demo/lib/Pod/Text.pm
     126 adaptive top                            /demo/lib/Pod/InputObjects.pm
     127 adaptive input_file                     /demo/lib/Pod/Parser.pm
     127 adaptive parse_text                     /demo/lib/Pod/Parser.pm
     133 spin     BEGIN                          /demo/lib/warnings.pm
     148 adaptive right_delimiter                /demo/lib/Pod/InputObjects.pm
     159 adaptive interpolate                    /demo/lib/Pod/Parser.pm
     162 adaptive bits                           /demo/lib/strict.pm
     167 adaptive BEGIN                          /demo/lib/Pod/Select.pm
     181 adaptive _set_child2parent_links        /demo/lib/Pod/InputObjects.pm
     183 adaptive cmd_name                       /demo/lib/Pod/InputObjects.pm
     183 adaptive nested                         /demo/lib/Pod/InputObjects.pm
     220 adaptive BEGIN                          /demo/lib/Pod/Usage.pm
     239 adaptive children                       /demo/lib/Pod/InputObjects.pm
     275 adaptive BEGIN                          /demo/lib/Pod/Text.pm
     328 adaptive BEGIN                          /demo/lib/Pod/Parser.pm
     352 adaptive DESTROY                        /demo/lib/Pod/InputObjects.pm
     352 adaptive _unset_child2parent_links      /demo/lib/Pod/InputObjects.pm
     437 adaptive preprocess_line                /demo/lib/Pod/Parser.pm
     455 adaptive import                         /demo/lib/vars.pm
     474 adaptive new                            /demo/lib/Pod/InputObjects.pm
     496 adaptive BEGIN                          /demo/lib/vars.pm
     643 adaptive append                         /demo/lib/Pod/InputObjects.pm
     670 adaptive BEGIN                          pod/pod2text
     695 adaptive BEGIN                          /demo/lib/Getopt/Long.pm
    1243 adaptive BEGIN                          /demo/lib/warnings.pm
    2946 spin     PERL                           PERL
   51019 adaptive PERL                           PERL

Posted by alanbur ( Sep 09 2005, 02:31:25 PM BST ) Permalink Comments [5]

20041222 Wednesday December 22, 2004

Making a perl filehandle into an object

I'm working on a project that uses both perl and MySQL, and I needed an easy way of piping stuff from perl into mysqlimport so that I could do fast bulk uploads to the database. MySQL is a bit odd in how it does this - it insists that the data has to come from a file who's filename prefix is the same as the table into which you are loading. This precludes a vanilla perl piped open and requires the use of a named pipe - the easiest way being to create a named pipe under /tmp of the form <table name>.<pid>.

I needed several loaders open simultaneously, so the obvious thing was to abstract the functionality into a class. I still wanted to be able to use normal print statements to output the table rows to the loader, so I wanted the filehandle to simultaneously be an object as well. Creating objects in perl is done with the bless operator, which requires a reference to bless - you can't bless a normal scalar value into an object. Fortunately open() in later versions of perl actually gives you a reference to a filehandle:

$ perl -d -e 1

Loading DB routines from perl5db.pl version 1.25
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(-e:1):	1
  DB<1> my $a;

  DB<2> open($a, '>', '/dev/null');

  DB<3> x $a
0  GLOB(0x3a40e0)
   -> *main::$a
         FileHandle({*main::$a}) => fileno(3)
  DB<4> bless($a, 'MyClass');

  DB<5> x $a
0  MyClass=GLOB(0x3a40e0)
   -> *main::$a
         FileHandle({*main::$a}) => fileno(3)
  DB<6> 

Having blessed the filehandle into the appropriate class we can then invoke methods on it (e.g. $a->do_stuff()), as well as print to it directly (e.g. print($a "hello\n");). However, having an object without any associated properties isn't really much use - in my case I needed to be able to store the name of the named pipe I was using to communicate with mysqlimport so that I could remove it when the filehandle was closed. It's possible to use the perl tie and overloading mechanisms to do this, but as this is perl and tmtowtdi always applies, and there is in fact a simpler although less obvious way.

To follow this it's first necessary to understand a little about how perl actually stores values internally. Variables come in different types, the common ones being scalars, hashes and arrays, repectively denoted by the leading $, % and @ characters on variables. $a, %a and @a look like they are entirely different variables, but in fact they aren't - they are all slots in a single perl symbol table entry called "a". These symbol table entries are called typeglobs or globs for short and are accessible by using the "*" prefix on a variable, so *a refers to the typeglob where $a, %a and @a all live.

The "a" filehandle that we opened in the example above also has a hash slot, so if we want to store additional attributes on "a" we need some way of getting the associated hash slot in its typeglob. This is actually very easy, although the syntax is a little abstruse:

my $fh;
open($fh, '>', $fifo);
my $self = \%{*$fh};
$self->{fifo} = $fifo;
bless($fh, $class);

Let's pick apart the line that assigns to $self. $fh is actually a reference to a filehandle, so we dereference it with *$ to the entire "fh" glob. The %{...} says we want to access the hash slot of the "fh" glob, and the \ gets us a reference to that, so $self ends up being a reference to the hash slot associated with "fh". Phew. We can then assign to it as a normal hash reference. When we subsequently call a method on the blessed $fh filehandle, we can use exactly the same chant to get back the hash reference and access the data that we put in it. This trick is used by the standard perl IO::Socket class to squirrel away socket attributes, but it's often useful to be able to associate properties with a filehandle yourself so I think this particular technique deserves to be more widely known. If you want further information on how all this stuff hangs together, you should check out the perlref manpage.

Posted by alanbur ( Dec 22 2004, 05:25:38 PM GMT ) Permalink Comments [0]

20041001 Friday October 01, 2004

Perl golf

My friend Stephen has posted a little challenge over on his blog. Basically it's to put a wrapper arond cal(1) so that if it is given a single argument between 1 and 12 or 'now' it prints a 3-month calendar centered around that month in the year, rather than the default behaviour which is to print a calendar for the year 1 through 12 - duh!

I of course took it as a perl golf challenge to produce the smallest (and therefore most unreadable) version possible. I know a few of my friends in perl-land read this blog, so perhaps you can come up with something even smaller than my current 404-byte version? To be fair, obvious hacks such as removing the die or shortening the /usr/bin/cal path are considered to be cheating ;-)

Here's my current effort, line breaks added to stop it sending your browser mental.

$c='/usr/bin/cal';($m,$y)=(localtime)[4,5];$m+=1;$y+=1900;$_=$ARGV[0]||'';
if($_ eq'now'){$a[1]=[$m,$y]}elsif($_=~/^\d+$/&&$_>=1&&$_<=12){$a[1]=[$_,$y]}
else{exec($c,@ARGV)||die($!)}$_=$a[1][0]-1;
@{$a[0]}=$_==0?(12,$a[1][1]-1):($_,$a[1][1]);$_=$a[1][0]+1;
@{$a[2]}=$_==13?(1,$a[1][1]+1):($_,$a[1][1]);
@c=map[map{chomp;$_}`$c @$_`],@a;printf"%-20s   %-20s   %-20s\n",
map shift@$_||'',@c for 1..7

Can you do better?

Update

Thanks to Jason Santos for spotting a bug, and being devious enough to get the count down even further - here's the latest version, see if you can spot the differences:

#!/bin/perl
$c='/usr/bin/cal';($m,$y)=(localtime)[4,5];$m++;$y+=1900;$_=$ARGV[0]||'';
if(/^now$/){$a[1]=[$m,$y]}elsif(/^\d+$/&&$_>=1&&$_<=12){$a[1]=[$_,$y]}
else{exec($c,@ARGV)||die$!}$_=$a[1][0]-1;
@{$a[0]}=$_==0?(12,$a[1][1]-1):($_,$a[1][1]);$_=$a[1][0]+1;
@{$a[2]}=$_==13?(1,$a[1][1]+1):($_,$a[1][1]);
@c=map[map{chop;$_}`$c @$_`],@a;printf"%-20s   %-20s   %-20s\n",
map shift@$_||'',@c for 1..8
Posted by alanbur ( Oct 01 2004, 08:46:42 PM BST ) Permalink Comments [5]

20040916 Thursday September 16, 2004

YAPC::EU roundup #1

Well, my talk yesterday on how we use perl to help in the development of Solaris went down OK - the only heckling was friendly (thanks, Nick ;-) and I was asked if I would release some of the bits I talked about as Open Source, specifically the stuff for allowing you to embed MySQL queries inside TWiki topics, and the mod_perl handler for doing SCCS history browsing. Something to do when I get back to the ranch.

The conference has wireless access, and by a fortunate coincidence my hotel room is directly across the street from one of the rooms we are using in the conference centre, which means I can get a wireless connection from my hotel room, which is a real boon!

I bumped into Abe Timmerman on tuesday evening in The Crown - the only pub I've ever been in that is owned by the National Trust! Abe looks after the Perl Smoke framework - this is a distributed set of machines that run daily regression tests on perl on as many different machines types as possible. After last's years YAPC::EU I scrounged together five old machines and installed Solaris 2.6 through to Solaris 10 on them to add to the smoke pool. I'd been kinda busy and not really paying attention to the test output, and one of the machines had wedged in the test suite back in June and had chewed up over 2000 hours of CPU - oops! The test framework was also in need of an upgrade, so we set to and upgraded it - hopefully by tomorrow we'll see the first sets of results from the new version of Test::Smoke.

I also had a quick chat with Leo Toetsch about what needs to be done on the Parrot front (Parrot is the new interpreter engine that Perl 6 will eventually sit on top of). The first things that would benefit from some attention are getting Parrot to build well with the Sun Forte compilers, and then there's a whole load of stuff to do related to threading support, making sure the x86 JIT works OK on Solaris x86 and improving the sparc JIT, which hasn't received very much attention. So, loads to do once Solaris 10 ships and I get clear of some of my current commitments on the Open Source Solaris project.

I also grabbed Nick Clark to discuss an issue that's been a bit of a pain for us when upgrading perl versions. Perl uses its version number as part of the path under which it stores libperl.so - for example, /usr/perl5/5.8.4/lib/i86pc-solaris-64int/CORE/libperl.so. The problem with this is that if you upgrade to a later but binary compatible version of perl, the version number component of the pathname changes as well. This means that anything that explicitly links against libperl can no longer find it. In general this is anything that embeds a perl interpreter within itself - an example being the Apache mod_perl plugin that we ship as part of Solaris. The obvious fix is to put a symlink somewhere and link against that, however some perl5-porters mailing list discussion will be required to sort this out.

And lastly I managed to get a nice new orange Fotango pen from Leon Brocard - the one I got from him during YAPC::EU 2000 had finally given up the ghost. Reason enough for coming to the conference :-) Posted by alanbur ( Sep 16 2004, 04:59:44 PM BST ) Permalink Comments [0]

20040913 Monday September 13, 2004

Off on a jolly^W important conference

As part of my pennance (and my sins must have been many), I look after the perl we ship in Solaris (yes, I admit it, it's my fault - whatever it is). However along with the heavy responsibility comes the opportunity to go off on various high flying conferences and talk to lots of other Important Open Source People like myself.

However, I've given all that up, and instead I'm going to YAPC::EU tomorrow in glorious...
Belfast. I'm also singing for my supper, in that I'm giving a talk on how we use perl to help in the development of Solaris. My manager will be happy (Hi Allan!) as instead of an expensive flight to Portland to go to even more expensive OSCON he gets off with a 130 Pound air fare (57 Pounds less than it costs to go to Watford on the train) and a paltry 65 Pounds for the conference fee. The reason that it is so cheap is that it is a grass-roots conference, run by the perl hackerati, for the perl hackerati. It's also cheap so that we can spend the bulk of our money on beer (only joking - honest!)

Actually I'm really looking forward to it, the perl crowd are a great bunch and I'm really looking forward to meeting up with my friends (well most of them - Some of the Finnish contingent wimped out this year - you know who you are ;-) IRC is OK, but you can't beat 'facetime' as it is so inelegantly called. Hopefully I'll have caught up on all the scurrilous perl gossip by the time I come back! Posted by alanbur ( Sep 13 2004, 11:01:01 PM BST ) Permalink Comments [0]