James Gosling: on the Java Road

« Previous day (Mar 22, 2006) | Main | Next day (Mar 24, 2006) »

20060323 Thursday March 23, 2006


Scripting flamewar

I made some comments recently about scripting languages that generated flame storms in a variety of places. For example, JDJ and Artima. Yes, I did say those things. But there's a lot of context missing, and they're the flippant soundbite version of what should have been a long and careful explanation that could easily mushroom into a series of PhD theses. Amongst all the flamage there are all kinds of generalizations about "scripting languages" versus "compiled languages". My big problem with a lot of it is simply that these two polarizing categories are a pretty poor way of capturing the distinctions between language designs. The terms are almost as goofy as "Republican" versus "Democrat". Taking huge multi-dimensional spaces of choices on different issues, then combining and simplifying them all down to a brutally simple binary choice is goofy.

Over the years I've built quite a lot of scripting systems. I've also built a number of compilers for non-scripting languages. Given enough beer I'll even admit to having implemented a Cobol compiler for money in the deep dark past. But I've done more scripting systems than non-scripting systems.

There are issues in the tradeoffs between the two linguistic spaces all over the place. Someday I'd like to write a long tour through them, but there just aren't enough hours in a day. I have a hard time getting enough time to do even trivial blogging: being truly thoughtful takes a lot of time. But I'll try to cover a few.... For now, I'll make the generalization that "scripting language" means one that is interpreted with dynamic runtime typing, and the other camp is languages that are compiled to machine code and have static runtime typing. This is a broad over-simplifying generalization, but it matches pretty well what goes on in common conversations.

Raw execution performance: One of the usual arguments for scripting languages having acceptable performance is that the overhead of interpretation and dynamic typing doesn't matter. The performance of the system is dominated by other factors: typically IO and the language primitives. For example, PERL apps usually spend the majority of their time in file IO and string primitives. I've strongly made this argument in the past, and it's quite valid. But having observed developers usage patterns, the two most common things that happen to erode the argument are:

  1. Developers start doing things that are outside of what the language primitives are good at. For example, PostScript has great primitives for rendering. So long as you're doing rendering, it flies like the wind. But then someone goes and writes a game that's heavily based on rendering, and a piece of it needs to do collision detection between missiles and targets. Physics in PostScript: a bad idea.
  2. Developers start clamoring for new primitives. Some are too specialized to be reasonable "I want a fast collision engine", some are rational "object oriented programming has become the dominant style in PostScript, but the OO model is implemented in PostScript as a library and is slow".
There are a bunch of ways to respond:
  1. "Buzz off". My personal favorite. Don't use a hammer to tighten a bolt.
  2. Start adding more primitives. This is hopeless: you drown in wave after wave of requests.
  3. Put in a facility for developers to write their own primitives. eg. link hunks of C code into the interpreter. Can work pretty well, but the linguistic universe schism can be problematic. Not only do developers then need to learn two environments, but because of the cross-language calls each language can impose difficulties on the other (eg. doing a language with a good garbage collector has a hard time interacting well with C's malloc/free regime.
  4. Make the "scripting language" fast enough to implement primitives. It tends to end up not looking much like a scripting language. This is roughly the road I went down with Java: to see how close to a scripting experience I could get while being able compile obvious statements like "a=b+c" into one instruction in the common case. I could have gone down the road of making declarations optional, but I intentionally didn't.
Worrying about scale, evolution and testing: strong compile-time types dramatically improve the ability of the tools (compiler, IDE, ...) to do global checking and manipulation. Some people like this. Some don't. There's a fair amount of evidence that these sorts of facilities are really helpful in dealing with systems that get large, evolve over a long time or require large teams. The most-often-cited reason for preferring dynamic typing is that it can make the development process easier: no need to bother typing all those yucky declarations.

There are a variety of middle-ground solutions: