|
So my copy of Beautiful Code showed up last week.
Although I am one of the (many) authors and I have thus had
access to the entire book online for some time, I do all of my pleasure
reading in venues that need the printed page (e.g.
the
J Church) and
have therefore waited for the printed copy to start reading.
Although I have only read the first twelve chapters or so,
it's already clear (and perhaps not at all surprising) that
there are starkly different definitions of beauty here: the book's greatest
strength -- and, frankly, its greatest weakness -- is that the chapters
are so incredibly varied. For one chapter, beauty is a
small and titilating act of recursion; for the next, it's that
a massive and complicated integrated system could be delivered quickly
and cheaply. (I might add that the definition of beauty in my own
chapter draws something from both of these poles: that
in software, the smallest and most devilish details can affect
the system at the largest and most basic levels.)
If one can deal with the fact that the chapters are
widely divergent, and that there is not even a token attempt to weave
them together into a larger tapestry, this book (at least so far, anyway) is
(if nothing else) exceptionally thought provoking; if Oprah were
a code cranking propeller head, this would be the ideal choice for her book club.
Now in terms of some of my specific thoughts that have been provoked:
as I mentioned, quite a few of my coauthors are enamored with the
elegance of recursion.
While I confess that I like writing a neatly recursive routine,
I also find that I frequently end up having to unroll the recursion
when I discover that I must deal with data structures that are bigger
than I anticipated -- and that my beautiful code is resulting (or
can result) in a stack overflow.
(Indeed, I spent several unpleasant days last week
doing exactly this when I discovered that pathologically bad input
could cause blown stacks in some software that I'm working on.)
To take a concrete example, Brian Kernighan has a great chapter in
Beautiful Code about some tight, crisp code
written by Rob Pike to perform basic globbing. And the code is
indeed beautiful. But it's also (at least in a way) busted: it
overflows the stack
on some categories of bad input. Admittedly, one
is talking about very bad input here -- strings that consist of
hundreds of thousands of stars in this case -- but this highlights exactly
the problem I have with recursion: it leaves you with edge conditions
that on the one hand really are edge conditions (deeply
pathological input), but with a
failure mode (a stack overflow) that's just too nasty to ignore.
Now, there are ways to deal with this.
If one can stomach it, the simplest way to deal with this is to
setup a sigaltstack and then
siglongjmp out of a
SIGSEGV/SIGBUS
signal handler. You have to be very careful about doing
this: the signal handler should look at the si_addr field in the
siginfo and comparing it to the stack bounds to confirm
that it's a stack overflow, lest it end up siglongjmp'ing out
of a non-recursion induced SIGSEGV (which, needless to say, would
make a bad problem much worse). While an alternative signal stack solution may
sound hideous to some, at least the recursion doesn't have to go under the knife
in this approach.
If having a SIGSEGV handler to catch
this condition feels uncomfortably brittle (as well it might), or
if one's state cannot be neatly unwound after an arbitrary siglongjmp
(as well it might not),
the code will have to change: either a depth counter will have to be
passed down and failure propagated when depth exceeds a reasonable
maximum, or the recursion will have to be unrolled into iteration.
For most aesthetic senses, none of these options is going to make the
code more beautiful -- but they will make it indisputably more correct.
I was actually curious about where exactly the Pike/Kernighan code would
blow up, so I threw together a little program that uses
sigaltstack along with sigsetjmp/siglongjmp to
binary search to find the shortest input that induces the failure.
My program, which (naturally) includes the Pike/Kernighan code,
is here.
Here are the results of running my program
on a variety of Solaris platforms, with each
number denoting the maximum string length that can be processed
by the Pike/Kernighan code without the possibility of stack overflow.
|
| x86
| SPARC
|
|
| 32-bit
| 64-bit
| 32-bit
| 64-bit
|
| Sun cc, unoptimized
| 403265
| 187225
| 77649
| 38821
|
| gcc, unoptimized
| 327651
| 218429
| 69883
| 40315
|
| Sun cc, optimized
| 327651
| 327645
| 174723
| 95303
|
| gcc, optimized
| 582489
| 524227
| 149769
| 87367
|
As can be seen, there is a tremendous range here,
even across just two
different ISAs, two different data models and two different compilers:
from 38,821 on 64-bit SPARC using Sun cc without optimization
to 582,489 on 32-bit x86 using gcc with optimization -- an order of magnitude difference.
So while recursion is a beautiful
technique, it is one that ends up with the ugliest of implicit dependencies:
on the CPU architecture, on the data model and on the compiler.
And while recursion is still beautiful to me personally, it will always be a beauty
that is more superficial than profound...
(2007-07-16 23:23:02.0/2007-07-11 01:45:14.0)
Permalink
|