Code Complete
20040930 Thursday September 30, 2004

Learn mdb in 30 minutes

As part of my involvement with the Blastwave community software project, I maintain SPARC and x86 versions of the Subversion packages for Solaris. When I became the maintainer for this software, I inherited with it a strange bug, which now is causing me problems. I currently use CVS for my day to day revision control needs, but I'd like to switch to SVN if only for the cool factor, but also to evaluate how it could eventually be used to replace the ClearCase repository my organization uses.

The issue is with the Apache 2 module mod_dav_svn, which enables SVN repository access via the WebDAV (distributed authoring and versioning) protocol. This is a great feature, as then it is possible to take advantage of modules like mod_auth_ldap to manage access control, or mod_perl to provide even more sophisticated functionality. The bug I was tracing occurs any time a file is committed to a SVN repository via DAV. The commit will fail, and the Apache error logs will contain a line or two like this:

[Thu Sep 30 16:00:10 2004] [notice] child pid 18115 exit signal Illegal instruction (4)

The Apache worker thread dies, and is replaced by another thread, waiting for a new request. (It seems to me that this should probably have a higher severity than notice) On the client side, the svn client outputs this useful message:

svn: Commit failed (details follow):
svn: MKACTIVITY of /svn/!svn/act/7cba529c-56e5-0310-9002-f184ad8e84d2:
Could not read status line: connection was closed by server. (http://server:5957)

So how was I going to track down what was happening with mod_dav_svn if it was being launched by Apache in response to an incoming commit request from my subversion client? I was going to learn how to use mdb, that's how. The first thing I did was learn how to launch a program in the debugger. It's easy: mdb /path/to/program. Apache requires a set of arguments, specifically the -X argument, which forces it to stay in the foreground and such. Finding out how to set arguments was a bit more difficult.

I'm somewhat familiar with the GNU debugger gdb, but this was my first time into mdb or any sort of Solaris debugging of this type (I've done very very low level hardware, firmware and software debugging on the Sun Fire mid-frame server line, but never applications). Using the mdb dcmd ::dcmds lists out the available commands. ::help <dcmd> for any dcmd will give the usage for that dcmd. I picked out the ::run dcmd, which is how arguments are passed into the target executable, and how execution is restarted from the top.

Now that I know this much, I can see what is happening to this module. Once the server is started with the ::run dcmd, I tried to commit a small change to the sample svn repository on my server. As soon as I connect, mdb dumps out with a SIGILL (Illegal Instruction). Using the ::stack dcmd prints out the stack trace:

bash$ mdb -uM /opt/csw/apache2/sbin/httpd 
Loading modules: [ libthread.so.1 libc.so.1 ]
> ::run -X -f /opt/csw/apache2/etc/svn.conf
mdb: stop on SIGILL
mdb: target stopped at:
0x169f08:       unimp     0
mdb: You've got symbols!
Loading modules: [ ld.so.1 ]
> ::stack
0x169f08(fca7bbec, 178010, 1, fff, 176340, ff36d518)
mod_dav_svn.so`dav_svn_get_txn+0x44(177f90, 177eba, 0, 243f0, fedccdd4, fe62949c)
mod_dav_svn.so`dav_svn_prep_activity+0xc(177ee0, 177eba, fe62aaac, fe60a5e4, 177ee0, 0)
mod_dav_svn.so`dav_svn_get_resource+0x5a0(176378, 2f, 0, 0, fca7bda4, 177ee0)
0xfe705564(176378, 800, 0, fca7bda4, 27500, 0)
0xfe70f08c(176378, 1, fe62ac04, 0, 800, fe72c9fc)
ap_invoke_handler+0x178(176378, fe710298, e313c, e2f94, 0, 0)
ap_process_request+0x30(176378, 4, 176378, 0, fe4614a0, 0)
0x2aa60(152ca0, 0, 152cf8, 8ac00, 8ac00, 1000)
ap_process_connection+0xc8(152ca0, 2aa04, e32a8, 0, 152c98, 8d9f0)
0x34790(f81f0, 152bb0, 18, 152ca0, 0, 18)
libthread.so.1`_lwp_start(0, 0, 0, 0, 0, 0)
>

From the stack trace, we can see that the issue is occuring in some unnamed function called by the mod_dav_svn function dav_svn_get_txn. Setting a breakpoint on this function can be done using the ::bp dcmd using the symbol name. Now we can try the commit again with a new breakpoint and see where it gets us:

> ::bp dav_svn_get_txn
> ::run -X -f /opt/csw/apache2/etc/svn.conf
mdb: stop at dav_svn_get_txn
mdb: target stopped at:
mod_dav_svn.so`dav_svn_get_txn: save      %sp, -0x90, %sp
mdb: You've got symbols!

Excellent. Now we have control over mod_dav_svn before it hits the illegal instruction. Mdb shows in the output that we have just saved off the stack pointer after dav_svn_get_txn is called by dav_svn_prep_activity. Looking in the actual source code for dav_svn_get_txn shows that the first steps in this function are to initialize and open a Berkeley DB (BDB) database, which is used by this installation of subversion to manage transactions in the repository. Mdb will now allow us to single step through the code using the short dcmd :s which steps one instruction at a time. If single stepping gets into a C library function or other function that is not critical to the current debugging task, specify :s out to step out of the function:

> :s
mdb: target stopped at:
libsvn_subr-1.so.0.0.0`svn_path_join+0x28:      call      +0x26d9c    <PLT=libc.so.1`strlen>
> :s
mdb: target stopped at:
libsvn_subr-1.so.0.0.0`svn_path_join+0x2c:      nop
> :s out
mdb: stop at mod_dav_svn.so`dav_svn_get_txn+0x34
mdb: target stopped at:
mod_dav_svn.so`dav_svn_get_txn+0x34:    mov       1, %o2
>

This shows mdb stepping out of the strlen() function called from dav_svn_get_txn. Quite a bit of single stepping is needed to get to the actual error, as we call into the Apache support libraries eventually. I knew from some previous research that others had found the issue to be inside of the aprutil library, so I knew not to step out of any apr_ functions. One of the biggest pains I found with mdb was single stepping. What I really wanted to do was to break on a function and single step through everything within libaprutil (but only libaprutil), printing each disassembled line. I couldn't find out how to do this in 30 minutes using the Solaris Modular Debugger Guide, and I also couldn't find how to repeat the command I just sent to mdb. What I really needed, and continue to need, as I still haven't found the answer, is a command like gdb '.', which repeats the last command verbatim. With this, I would be able to type :s, then use a single character to step through lines, rather than two (50% efficiency improvement :). Eventually, after carpal tunnel started to set in, I hit the SIGILL:

> :s
mdb: target stopped at:
0xff3515c0:     ld        [%o0 + 0xf0], %l3
> :s
mdb: target stopped at:
0xff3515c4:     jmpl      %l3, %o7
> :s
mdb: target stopped at:
0xff3515c8:     clr       %o1
> :s
mdb: target stopped at:
0x169fc8:       unimp     0
> ::dis -w
0x169fa0:                       unimp     0
0x169fa4:                       unimp     0
0x169fa8:                       unimp     0
0x169fac:                       unimp     0
0x169fb0:                       cb13      -0x440000   <0xffd29fb0>
0x169fb4:                       unimp     0
0x169fb8:                       unimp     0
0x169fbc:                       unimp     0
0x169fc0:                       unimp     0x69
0x169fc4:                       unimp     0
0x169fc8:                       unimp     0      <--- culprit
0x169fcc:                       unimp     0
0x169fd0:                       unimp     0
0x169fd4:                       unimp     2
0x169fd8:                       swapa     [%g4 + -0x18] %asi, %i7
0x169fdc:                       swapa     [%g5 + %l0] 03, %i7
0x169fe0:                       unimp     0x20
0x169fe4:                       unimp     0xa
0x169fe8:                       unimp     0
0x169fec:                       unimp     0
0x169ff0:                       unimp     0

Using the ::dis dcmd disassembles instructions near the current code pointer. The -w argument shows a window on either side of the current line. One more step and the process would throw a SIGILL and the apache request will die. A little more break point experimentation resulted in this call sequence:

mod_dav_svn.so`dav_svn_get_txn
    libaprutil.so.0.9.5`apr_dbm_open
        libaprutil.so.0.9.5`apr_posix_perms2mode

It looks like Forte generated a bit of bad code in libaprutil, which could be due to the fact that it is compiled with a high optimization level. This supported the claims of some users on the Subversion user mailing list who indicated that the issues were within libaprutil. The user had recompiled APR, Apache 2 and Subversion with GCC 3.4, and the SIGILL problem went away. The next step for me is to work with the apr and aprutil maintainer to produce some unoptimized versions of the APR libraries.

Permalink Comments [2]

Upgrade Update

I spent some time last night second-guessing my decision to go with a Socket 939 solution instead of a Socket 754. The AMD64 processor line has an on-die memory controller, like the UltraSPARC-III and later processors. Socket 754 was the first to be introduced, with a single channel DDR controller, while earlier this year saw the first 939 processors and boards with dual-channel DDRCs. According to most reviews, the difference is not that large - a best case 3-5% performance improvement. However, there is a third AMD64 socket type - 940, which is used today by the Athlon FX series of processors and the server based Opteron chips. The plan is to migrate the FX series from 940 to 939, leaving the Opteron in the server space. The Athlon 64/64FX processors will all unified in the Socket 939 spec, targeted at workstations and enthusiasts. Socket 754 won't go away, as AMD will target this at the main-stream desktop market. The Sempron and Socket A will continue to serve the budget desktop market.

As Socket 939 is a relatively new introduction, processors and motherboards are still expensive. Most of the forum posts I've read on the topic suggest that the price/performance for Socket 939 just isn't there. Of course, most of these posts were from the start of this summer, and since then the price of an AMD64 3500+ has fallen from ~US$500 to ~US$330. The nearest Socket 754 chip comparable to a 3500+ is a 3400+, which comes in at ~US$290. The price difference of ~US$40 seems like a small amount to spend to get a better upgrade path, once AMD starts introducing dual-core processors (assuming that won't be another DDRC change requiring yet another socket change). I finally convinced myself that 939 was the way to go. Case 50% closed.

The other side of the equation is the motherboard. Socket 939 motherboards are also relatively inexpensive, compared to what I have spent in the past to upgrade my system. The MSI board listed in my last AMD Upgrade post is ~US$140, while some of the Socket 754 boards with similar functionality come in ~US$120. The difficult choice here was between the NVidia nForce3 250Gb chipset and the VIA K8T800. Every non-forum review of these two chipsets I have read concludes that both are equal. Anecdotal evidence from forums seems to suggest that nForce is more stable than VIA, but I've been using ABit boards based on VIA chipsets for years without stability issues. One of my chief concerns is support for the chipset in non-Windows environments like Linux and Solaris x86_64. VIA has generic drivers in the Linux kernel, so basic functionality can be had without having to resort to separate drivers. Functionality of nF3 based boards in Windows surpasses that of VIA, but it requires NVidia's proprietary drivers to operate in Linux. At this point, I'm not sure if either chipset is supported in Solaris.

My decision is now down to the final sticking point. Should I go with the VIA, and spare some of the performance and stability for driverless support under Linux? Or should I go with the nForce3 board, get superior Windows performance for games, and live with the fact that I will, yet again, be out on the cold, harsh, bleeding edge of technology.

If it were'nt for the fact that Newegg.com has been down for the last 12-14 hours, I probably would have already made the decision. The extra time I have to mull over the issue is certainly causing me to overthink. I can hardly believe that a site like Newegg.com, which is absolutely critical to their business (it is their business), is down for this length of time. Even a planned outage for the revenue producing face of this company should not even be noticed by the users, much less interrupt purchases and prevent new ones from accessing the store.

Maybe it would be a good idea for the Sun sales force to descend on Newegg and show them how e-business is really supposed to work.

Permalink

Give me caffeine or give me...

No, wait... I just want some coffee. The good graces of Sun keep my brain awash in caffeine from the time I arrive, until I collapse in a lifeless heap around 3:30PM from too much caffeine (not really). However, there are those who would interrupt my flow of this essential element, through their antisocial coffee etiquette faux pas. Here are the rules as I understand them -- there are few, but critical to survival (not really):

  1. When thou com'st upon an unfilled urn of coffee, fill't up.
  2. Thou shalt not drain the urn and not refill because thou shalt "miss thine meeting".
  3. If thine meeting is a grave matter, and lateness brings punishment, thou shalt drink of a lesser beverage (water?), or appoint an acolyte to fill the urn.
  4. Thou shalt not silently drain the urn, and convince thine self that it is truly not empty. It is.

Can't we all just get along?

Permalink Comments [1]
20040929 Wednesday September 29, 2004

RSS/Atom Auto-discovery

My last article on bookmark publishing was picked up on Dave Johnson's roller blog today, with some interesting ideas for enhancement. One idea I found interesting was RSS auto-discovery. A quick search on google showed that a few people have expressed ideas about how to use HTML <link> elements to discover alternate content types for a particular page. As well, Dave suggested that I could output the bookmark hierarchy using Outline Processing Markup Language [OPML], and import the resulting document into Roller (once blogs.sun.com updates to Roller 0.9.9, that is).

As there is only so much I can do between conference calls, writing requirements documents, and planning out my yearly goals, I will focus on idea the first. OPML will have to wait for another article, but if you have extra time, feel free to read ahead.

You might be familiar with the <link> tag. It allows you to specify cascading stylesheets and &lquo;favorite&rquo; icons, etc. It is also possible to specify additional content types for your page. If you click View -> Page Source on this page, for instance, you would see that I define an alternate content type of application/rss+xml. The path given in the link for this document is relative to the blogs.sun.com server, but it can be absolute too to pull alternate content from a different server.

There are several different content types that I am interested in for the bookmark publisher script. While RSS is quite common, other formats such as Atom and RDF are popular. I want to ensure that among all of the favorite icons and style sheets, I get only the links to alternate content types. Each link tag has a type attribute, which contains the MIME type of the alternate link. As I search through the link entries, I'll check each one to see that it matches one of my desired content types. As I intend to roll this function back into the bookmark publisher script when I'm done, I'll write the functionality as a subroutine that takes a URI object and returns a hash reference. The hash reference will be keyed by MIME type, and will contain a URI for each verified content type. The function starts out by listing the acceptable content types:

use strict;
use LWP::UserAgent;
use URI;

my $ua = new LWP::UserAgent( env_proxy => 1 );

sub autodiscover {
    my $uri = shift;
    my $map;

    my %ALT_TYPES = map { $_ => 1 } qw(
        application/rss+xml
        application/rdf+xml
        application/atom+xml
        text/xml
    );

In contrast to the accept_bookmark subroutine I described yesterday, this routine will require LWP::UserAgent, which is the full featured object which underlies LWP::Simple. A UserAgent enables far greater control over the request and the response, which makes it ideal for this task. The above code creates a global user agent, which should be used by all sections of the final bookmark publisher script for grabbing files and checking URLs. The autodiscover function then grabs a URI to check, defines the $map of media types to URIs for return, and a list of alternate media types that we want to auto-discover. The map function maps all of the listed content types into the hash with '1' as the value. This makes checking for acceptable media types easier.

Now that we have poured the foundation, it is time to check the passed URI to see if it's pointing at anything interesting:

    my $rsp = $ua->get( $uri );
    return {} unless $rsp->is_success;

    # Record the link content type
    my $headers = $rsp->headers;
    my $ctypes  = ref($headers->{'content-type'}) eq 'ARRAY'
        ? $headers->{'content-type'} : [ $headers->{'content-type'} ];

    my ($ctype) = split /;/, $ctypes->[0];
    $map->{$ctype} = $uri;
    $map->{default} = $ctype;

In the accept subroutine discussed yesterday, we used the head method of LWP::Simple to check if a link is valid. In order to get a handle on the embedded link tags in the target HTML document, we need to use the get method instead. Unfortunately, the method provided by LWP::Simple does not return enough information to enable these links to be processed. The get() method of LWP::UserAgent returns a HTTP::Response object, which provides a headers() method, which then returns a HTTP::Headers object, which we are most interested in.

After requesting the target link, we can check to see if the HTTP response code was success, and return an empty hash reference if it's not. This should indicate to the caller that no valid media types were found for the specified URI. Next, we read the response headers, and determine the content type of the returned document (there's no sense guessing). If there is only one content type, the content-type header contains a scalar value, but if more than one type is present (e.g. if there were different content encodings available), the field contains an ARRAY reference. We deal with this by detecting the field type, and forcing the scalar into an array reference. We can then split the content type from any additional information such as encodings or weightings. Type weightings are expressed in the form of q=x where 0 < x <= 1 which indicates preference when multiple types are available. For simplicity, this example discards weighting, but it might be useful in the future. Once we've separated the MIME type out of the header string, we assign the original URI to that content type, and indicate that this content type is the default.

Now, I often save links to documents that are not HTML, like PDF documents and images. Obviously, these documents can't contain any link references, as they are not HTML, so we must skip auto-discovery for any content types that are not text/html. This should likely not be limited to just text/html, however, as other valid types like text/xhtml or even text/xml might contain useful information. We can worry about this in the final application -- this script is just for discovery, so it's OK to drop the ball here. Link tags, if they are available in the target document, come in the header field link. Here is our link extraction code:

    # Don't autodiscover for non-html links (e.g. pdf, images)
    if ($ctype eq 'text/html') {
        my $links = ref($headers->{link}) eq 'ARRAY'
            ? $headers->{link} : [ $headers->{link} ];

        foreach my $link (@$links) {
            my ($href, $type) = $link =~ /<(.+?)>;.+; type="(.+?)"$/;
            next unless $href and $type;

            if ($ALT_TYPES{$type}) {

                my $nuri;
                if ($href =~ m#\w+://#) {
                    $nuri = new URI($href);
                }
                else {
                    $nuri = new URI($uri);
                    $nuri->path( $href );
                }

                # Check that the feed actually exists...
                $rsp = $ua->head( $nuri );
                $map->{$type} = $nuri if $rsp->is_success;
            }
        }
    }
    return $map;

The first couple of lines should be familiar. The link field works the same as the content field, expressed as a scalar value if there is only one link, or an ARRAY reference if there are more than one. Again, we force the scalar into an array reference. Next, we iterate through each of the links in the document. For each link entry, we extract the href attribute value and the MIME type. If we can't find both, then this is not a proper link entry, so we skip on to the next item. If we do properly extract both fields, we can then check the type against our predefined map of content types. If the content type is in the alternate type map, we construct a new URI value for the type, and check that it exists. Note that we first check to see if the link specified an absolute URL (e.g. one with a scheme). If it did not, the link is considered relative to the original site, so we just replace the path segment in the original URI. To be pedantic, the next step is to check that the listed feed actually responds. If it does, the type and source URI are inserted into the map, and returned to the caller.

When I point my autodiscover function at this blog, I get the following structure back (printed out here with Data::Dumper):

bash$ ./link.pl 
$VAR1 = {
          'default' => 'text/html',
          'application/rss+xml' => bless( do{\(my $o = 'http://blogs.sun.com/roller/rss/comand')}, 'URI::http' ),
          'text/html' => bless( do{\(my $o = 'http://blogs.sun.com/comand')}, 'URI::http' )
        };

This is what I was expecting -- a default type of text/html, because the main page existed, and an alternate type of application/rss+xml which points to the RSS feed for my site. If the main page did not exist, I'd get back an empty structure, and I'd know to skip the site entirely. This might not be the desired sequence of events, however. It might be desirable to attempt auto-discovery even in the case of a broken main link. We'll see how things turn out when I integrate this function back into the main bookmark publisher script.

Permalink
20040928 Tuesday September 28, 2004

Publishing Netscape Bookmarks

For quite some time, I've wanted to publish my large list of bookmarks on the web. The primary reason is to give me a map of information to use when I am not near a browser with my bookmarks. Another reason is to let others benefit from the time I've spent gathering and organizing these links. I could just upload my Netscape bookmarks.html file, as it is just HTML, but there are issues.

Issue the first is that I have Sun internal links sprinkled liberally through my bookmark folder. I could copy my bookmarks.html aside, then manually remove the internal links, but the current version contains 325 bookmarks, and I am impatient. I also don't want to go through this exercise every time I update, add, or delete a bookmark. Issue the second is that some of my links are old and outdated -- documents have moved (or decomposed). I need to identify links that are broken, and either deal with them, or remove them from the bookmark file altogether. Issue the last is that there are some links in my folders which I do not want published to the world. Sure, I trust you to keep where I bank a secret, but not that guy in the office next to you -- he's kinda shady.

So what to do when faced with a big text processing task like this? Whip out Perl, of course! There are several approaches here, as with every task Perl is involved with. My tack starts with a utility called HTML tidy. This utility will take the not very well formed Netscape bookmark HTML and give me well formed XML. In Perl, I prefilter the bookmarks like this:

use strict;
use File::Temp qw/tempfile/;

# Filter STDIN through tidy
my $temp = new File::Temp( UNLINK => 1 );
open TIDY, "| tidy -quiet -asxml 2>/dev/null 1>$temp"
    or die "Failed to open pipe to 'tidy': $!";
print TIDY while (<>);
close TIDY;

Now the file named by $temp contains well-formed XML. Note that the temporary file uses UNLINK => 1. This will cause the tidy formatted XML file to be cleaned up when the program exits or the $temp variable goes out of scope, whichever comes first. Now that I have well formed XML, I can search through the bookmarks programatically. My weapon of choice for tasks like this is the fine XML::XPath module set written by Matt Sergeant. To begin with, I need to identify the root of the personal toolbar folder:

use XML::XPath;

my $xp = new XML::XPath( filename => $temp );
my $root = $xp->find( '/html/body' );

Bookmarks in the file are all organized as HTML definition lists (DL/DD/DT). The very top of the document is inside of the body tag. All of the nodes within the body are now contained in the $root variable. The general pattern for folders and bookmarks within the file is as follows:

 DL
  DD
   H3 -> Folder Title
    DT
     A -> Bookmark
    DT
     A -> Bookmark
    DD
     H3 -> Sub-folder Title

This structure can be arbitrarily deep, therefore the script must be able to handle this. The best way is to process the file using a recursive function:

sub collect_bookmarks {
   my ($ctx, $href) = @_;

   # For each folder root...
   my $f_result = $ctx->find( './dl/dd' );
   foreach my $f_node ($f_result->get_nodelist) {

       # Grab the folder title, skip if no name (separators)
       (my $f_name = $f_node->find( './h3' )) =~ s/^\s*|\s*$//g;
       next unless $f_name;

       # Within this folder, search for bookmark entries
       my $a_result = $f_node->find( './dl/dt/a' );
       foreach my $a_node ($a_result->get_nodelist) {

           # Retrieve and normalize the URL and bookmark title
           my $link = $a_node->getAttribute('href');
           my $title = $a_node->string_value();
           $link =~ s/^\s*|\s*$//g; $link =~ s/\n//g;
           $title =~ s/^\s*|\s*$//g; $title =~ s/\n//g;

           # Store the bookmark unless it's bad
           if (accept_bookmark($title,$link)) {
               $href->{$f_name}{$title} = $link;
           }
           else {
               print "Skipping bookmark: $link";
           }
       }
       # Recursive call to process subfolders of this node
       collect_bookmarks($f_node, \%{$href->{$f_name}});
   }
}

We call this with the root folder as:

collect_bookmarks($root, \%BOOKMARKS);

This will identify and recursively process any subfolders that are present in the bookmark file structure. Each iteration passes in a hash reference, which that call will populate with a folder name and one or more bookmarks. Note the call to accept_bookmark(), which makes the final decision on whether a bookmark is good or bad. Bad bookmarks are defined by my own criteria, which filters out broken, invalid and internal links, as well as those that I don't want to publish. The function looks like this:

sub accept_bookmark {
   my ($title, $link) = @_;

   # Parse the link URL
   my $luri = new URI($link);

   # These things are all bad.
   return 0 if $luri->scheme !~ /http|ftp/
            or $title =~ $PRIVATE
            or $link =~ $PRIVATE
            or $luri->host =~ /\.(corp|ebay|sfbay|west|central|uk$)/
            or index($luri->host,'.') == -1
            or not head($luri);

   return 1;
}

This uses URI to filter out any file:// links that might be hanging around, and the head() function from LWP::Simple to check links. The $PRIVATE variable is a compiled regular expression (using qr{}) which contains title and link patterns that I don't want to publish.

Now that I can browse all of the information in the bookmarks, the next step is to write them out in some browseable format. I'm thinking DHTML collapsable lists, but I could just as easily print out simple HTML. I haven't decided yet, but I'll publish the rest of the script in a followup post once it is complete.

PS: Syntax highlighting above was done using Vim 6.3 and the code2html.vim script by Soren Anderson.

Permalink Comments [3]
20040927 Monday September 27, 2004

Take a SWIG

Before I begin, it occurs to me that my posts might be just a bit too long. Maybe I should split stuff over a number of posts (or even days), but when I get on a roll, I can type forever (frequency of posting shows my average motivation level ;). Maybe I'll change the format, but probably not... I'm unpredictable like that.

SWIG, the Simple Wrapper Interface Generator, is a great way to wrap C/C++ libraries for use with scripting languages. It supports the favorites - Java, Perl, Tcl, Python, and PHP as well as some newer and perhaps more obscure languages such as Ruby, Guile, MZscheme, Ocaml, and Chicken (huh?). For most standard calling conventions, it is possible to point swig at the header files for a library, and it will spit out a C interface file to that library. I say most standard calling conventions here, because as I've found out over the last couple of weeks, some things are hard to do in a language agnostic fashion.

For instance, consider the case of the Solaris libnvpair(3LIB) interface. In my project, I want to be able to wrap a function from another library which returns a pointer to a nvlist. I don't want to wrap the entire libnvpair interface in all of it's glory just to be able to read a nested nvlist. I'm not interested in adding to or modifying the nvlist that is returned to my code, so I will turn the nvlist into a structure which is native to whatever scripting language I'm using. As I mentioned in my last article regarding extension and embedding, this boils down to Perl, and then maybe Python and Java (adoption for Perl for system admin tasks is high, less so for Python; Java is not widely used, but could be).

For Perl, it seems most natural to have the nvlist copied out into a hash reference, which the Perl code would then be able to walk through to make decisions about the returned data. The code to do this is quite interesting, and needs to be added on as inline code in the SWIG interface file. For instance, to create a simple hash reference, here is the Perl API code:

  HV * hash = (HV*)sv_2mortal((SV*)newHV());
  hv_store(hash, "mykey", 3, newSVpv("my value", 0));

This is similar to the following Perl code:

  my $href = {};
  $href->{mykey} = "my value";

Now, for the real application, C code will be required to traverse the nvlist and copy keys and values to an HV. Obviously, the above code is very Perl specific. To do the same thing in Python, I need to traverse the nvlist and create a Python dictionary object, for Java a java.util.HashMap or similar. Each will require a bit of VM magic code to make the translation. Luckily, SWIG is there to save the day, and makes it quite easy to #ifdef each language specific section, as it gives the interface file a once over with the C pre-processor before creating the wrapper. Then I just have to bind each wrapper with the corresponding language headers and libraries, and I get a nice loadable module.

My main concern, as C is really not a core competency in my department, is to keep the interface as clean and understandable as possible. I also want to have the flexibility to experiment and explore with other languages. SWIG gives me the best of both worlds.

Permalink

x86 Shuffle

I don't work from home that often, so I don't require a SPARC based workstation like I enjoy in my office, but I do enjoy a good game of Doom 3 now and then. To support this habit, I've decided to upgrade my current home system. I built my current system in early 2000, and it has remained future proof up until very recently. Specifically, releases of games such as Doom 3 and Halflife 2 have required more GL power than my old GeForce 2 32MB card could provide. My current rig:

  • ABit VP6 Dual PIII mainboard
  • 2 x PIII 1GHz Coppermine (C0 stepping)
  • 512MB Corsair CAS3 SDRAM
  • SB Audigy Platinum
  • Netgear MA311 802.11b
  • ATI Radeon 9800PRO 128MB
  • 2 x IBM Deskstar 60GXP 60GB/ATA100/7200RPM/2MB Cache
  • IBM Deskstar 60GXP 40GB/ATA100/7200RPM/2MB Cache
  • Pioneer DVD-106S (Slot Load)
  • LITE-ON LTR-48125W 48x12x48 CD/RW
  • Antec Sonata Case w/380W PS
  • NEC MultiSync FE1250+ 22"
  • Cambridge Soundworks 5.1 Audio
  • Dual boot Windows XP and Fedora Core 2

Bold items are the lucky components to be included in my new system. You might be thinking -- this seems like enough, why upgrade? Well, it all began when I upgraded my video card to the Radeon. The board layout for the VP6 puts CPU1 directly above the AGP slot, which I didn't believe to be that big an issue when I bought the board. When I bought the Radeon, I knew that it would produce a lot more heat, so I bought some nice new Thermaltake copper heatsinks for the P3s. Really, the point was to reduce the overall noise level of the system, which I've since learned is next to impossible with a dual CPU VP6 (larger heat sinks from Zalman were recommended here, but don't fit).

After I moved all of my hardware into the Antec case, added in the Radeon card, and strapped on my shiny new copper heatsink, almost everything came up, which is obviously not acceptable. Now, the VP6 board comes with a RAID controller on board, which allows me to run RAID 0, 1 or 0+1 with up to four drives. Alternatly, I can turn off the RAID function, and have two more IDE controllers to use, which allows for up to a total of 8 IDE devices. I had configured my system to run each drive as a channel master, with only the 40GB storage drive and DVD sharing a bus. Normally, when I start the system, it displays the normal hardware probe screen, and then switches to a drive detection screen for the RAID controller (Highpoint HPT370).

So, when I started up the system in a new case with the fantastic Radeon card inside, I was puzzled when the RAID detection screen didn't come up. Switching the card for my old GeForce caused the detection screen to reappear, but I wasn't about to take the Radeon back to the store. I posted a few times on ArsTechnica Forums, probing my initial suspicion that there was some sort of IRQ sharing problem between Mr. RAID controller and Mr. Radeon. Someone on the Ars forums postulated that the Radeon might not be properly releasing resources after initialization, leaving insufficient space in BIOS mapped RAM to allow the RAID controller to initialize. Of course, I am explaining this phonetically, as I really don't fully understand how ACPI initializes PCI cards.

I still haven't figured out how to fully resolve the issue, but I figured that it was partly due to the age difference between the Radeon and my mainboard. Without the HPT370 drivers in Windows XP, the devices attached to the RAID controller do not show up. Installing the drivers makes the devices appear -- Linux gets it right the first time with no extra drivers or configuration. So, dispite the fact that I have to do a bit of a jig after a fresh Windows XP install to see my drives, at least I now can access them. Problem solved enough.

Now that the all singing, all dancing Radeon is working under Linux, I made a special effort to find the highest performance, higest resolution, most fantastic and earth shattering screen saver. This came in the form of a translucent dancing torus (4D Hypertorus, for you KDE users). After staring at it for several minutes, I locked the screen and allowed the torus to dance into the night. An hour later, I came back to the system --- locked solid. I thought that it might be an issue with Linux, which has really never worked 100% with my dual cpu VP6 anyway (issues mostly with ACPI), so I rebooted, and no lockups while I worked for the rest of the evening. The next time I left the system on the screensaver, it locked up again. Hmmm. I switched screensavers, and now I haven't had a lockup for over a month.

The issue turned out to be heat. As I mentioned before, the VP6 puts CPU1 directly above the AGP slot. With a Radeon installed, this is similar to putting CPU1 in a toaster oven. Checking temps on the CPUs after running some 3D demos, I found that CPU0 was hovering around 35-40C, and CPU1 was up around 55C. Heat was rising off the Radeon, and warming the CPU heatsink. There's no opportunity for a larger heatsink, and I really don't want to mess around with liquid cooling for this old system. This was really the last straw for the dual CPU setup.

The new rig will use the bold items in the listing above, with the addition of the following:

  • MSI Neo2 Platinum nForce3 Ultra mainboard
  • AMD Athlon 64 3500+ (Socket 939, Newcastle core, 2.2GHz)
  • 1GB(512x2) OCZ DDR PC3200 (2-3-2-6 1T)

From what I gathered from a whole Sunday afternoon of research and pricing comparisons, the Socket 939 platform seems to be the most future proof for AMD64 desktop systems. Socket 939 is similar to the original AMD64 Socket 745, but provides a dual-channel memory controller on board instead of a single channel controller. It seems that the number of socket types is blooming at the moment in the AMD space. This is due to the on-board memory controller -- every time a new type of RAM appears, silicon must be changed in the CPU. As a result, AMD needs to make sure that the CPU cannot be installed in the wrong system, and their solution is to change the socket interface. From my observation, Sun has sidestepped this by using proprietary RAM, so nearly everything can be held constant to allow multiple models of CPUs to use the same RAM and mainboards.

Anyway, we will see once the new system comes together whether I will finally be free of the upgrade struggle for another four years. I like to upgrade my system, but don't like doing it more than once every couple of years or so, or less for major brain transplants like the above.

Permalink
20040923 Thursday September 23, 2004

Extending and Embedding Perl

In my experience researching complex topics, such as extending and embedding Perl, it is inevitable that the range of examples available will be large. Unfortunately, the examples that I have found regarding extending Perl fall into two categories. The first category of examples are far too simplistic. These range from the ubiquitous 'hello world' example written in XS, to tutorial examples on how SWIG might wrap the libgd graphics library. These are all very straight forward examples, and should suffice for most real world tasks. Not mine, however.

The other category of example code is mind blowingly complex. This camp includes most well known examples of Perl extension -- I've searched through the multi-language SWIG bindings for Subversion, and the Perl specific bindings to Tk. Both are stunning examples of how to do extension correctly. Unfortunately, there is a huge overhead to wrapping your mind around projects of this scale. Multi-module, multi-library projects are far far too complex for what I need to do.

I was really unable to find a happy medium -- I need to expose a number of C structures as Perl objects, reserving the option to expose them to other languages (e.g. python, java, guile). For this, some of the aspects of Subversion bindings are easy to understand. However, I also need to be able to pass callbacks to C from Perl code references to allow the C libraries to pass information to Perl. In Subversion, callbacks are provided by a language specific runtime library which implements a thunk editor. The subversion thunk editor manages manipulation of the stack such that calling back and forth between Perl and C works according to the Perl stack protocol. Just hearing the word 'thunk' makes me shudder, with horrible visions of Win32 dancing in my head. At any rate, the Subversion method seemed to me to be just a bit more abstraction than I wanted to deal with for my (relatively) simple case.

As I was dreaming about how to get my extension project off the ground, I found a massively useful book at my local Barnes & Noble book store. The book, Extending and Embedding Perl by Tim Jenness and Simon Cozens. In my opinion, this book is the most complete and coherent source of Perl extension and embedding knowledge available. It's not terse like the documentation which comes with Perl -- although that continues to be an authoratative source of this info. Instead, it's a well written, easy to read coverage of all the details needed to get the job done. It includes fully documented code samples -- which cover a broad range of projects. Want to find out how to create a new Perl scalar value in C and pass it back to Perl? There is simple example code to demonstrate the technique, along with detailed breakdowns of the output of tools such as Devel::Peek.

Suffice to say that I love this book. I've nearly read it front to back in about a week, which says a lot considering the dry nature of the subject material. However, I'm motivated to learn, as I want to complete my extension project. Over the coming days, I'll outline where I am with the project, and offer up a description of how to solve extension and embedding problems that aren't covered by 99% of available examples. Stay tuned.

Permalink Comments [1]
20040914 Tuesday September 14, 2004

Sun Open Source Summit

I attended (via webcast + IRC) the Sun Open Source Summit which was organized by the Sun Software CTO, and held at the Santa Clara Auditorium. It's a shame I couldn't be there in person, but c'est la vie (translation: it's not my day job, but I wish it were). I managed to attend all of the afternoon Auditorium tracks, including the Lightning Talks (which were amusing in their brevity). Talks were given on projects such as Rome, an open Atom/RSS toolkit for Java, and of course Tonic, which is upcoming Open Source Solaris. Of particular interest was hearing Joerg Shilling, author of cdrtools and star among others. Joerg is also involved with the Blastwave Community Software project, which I recently joined. Blastwave provides high quality, low touch Solaris packages for a wide variety of Open Source software</plug>.

The Auditorium also hosted several main sessions this afternoon which I was able to attend:

  • Open Source Business Models

    “Charging money for access to source code is a tried and true way to make lots of money. But thanks to projects like Apache and Linux, more and more customers (including Sun customers) are demanding code access and a marked procurement preference for F/OSS code. Why is this happening and how will proprietary code producers make the switch to F/OSS while still maintaining profits?”

  • Community vs. Control

    “In many ways creating very high-quality software is all about controlling the engineering environment, certainly what gets checked in and how changes are vetted. Sun's engineering process was designed to deliver proven results to our customers. Yet increasingly there is a competing priority of attracting and engaging communities of outside developers who may not have patience and discipline for the established process. We know from some of our previous projects with the F/OSS community that Sun Control is seen as a bad thing. We are told that giving up control will reap rewards, but what will it do to projects like Solaris?”

  • Sun Lessons

    “This [...] session is a chance to share lessons learned during Sun's many experiments hosting or participating in F/OSS communities. Which assumptions were completely validated and which were so far off base that we had to regroup? How did we solve the challenges that came up? Which practices worked the best? What were the unintended outcomes (both positive and negative)?”

  • Sun and Tonic

    “Tonic is the codename for Sun's OpenSolaris project. Another name we considered for this talk was "The Truth about Tonic". Moderated by Claire Giordano from Solaris. Come listen to DE Andy Tucker, engineering manager Karyn Ritter, community manager Jim Grisanzio, and senior staff engineer Bart Smaalders give an overview of the program, talk about why we are open sourcing Solaris, provide a status update, and answer your questions.”

All of the sessions were very well attended, and discussions usually ended with the F/OSS community members encouraging Sun to embrace open source, with full support of the community. There were so many informed, well articulated discussions, that I could not begin to summarize everything that happened during the afternoon sessions. I'll leave this to the real journalists -- primarily James Turner from LinuxWorld Magazine. This is a great time to be involved with Open Source, and a fantastic opportunity for Sun to win back hearts and minds in the broader community. I'm personally exited about the possible interaction between the CSW project and Open Solaris, so watch this space.

Permalink