What's this all about?
-- Lewis Carroll
This is the first posting for a blog about computer games, virtual worlds, distributed systems, and concurrent programming. If this seems like a strange combination of subjects, then please read on; I'd like to convince you that the combination is not only natural but necessary.
First, a bit about me. I've been doing distributed computing and object-oriented programming for some time. I started all this at Apollo Computer, where I worked on text libraries (shipping one of the first commercial C++ class libraries not done at Bell Labs) and distributed systems. When Apollo was acquired by Hewlett-Packard I found myself an HP employee. While there, I led a group that implemented a system that became the base of the first CORBA specification.
About 16 years ago, I joined Sun (and more specifically Sun Labs), where I continued working in the area of distributed systems. I did some kibitzing during the design and implementation of Java Remote Method Invocation (RMI), which got me and the rest of my group thrown out of the labs for being overly relevant, and then led the group that did the Jini Networking Technology. Like I say, I've been doing distributed systems for some time.
I came back to the labs about four years ago, looking around for interesting work in distributed systems. I did some work in systems for large-scale medical sensing that, while interesting, couldn't have been deployed in the current environment for medical care in the United States. About the time that I was coming to this realization, I was asked to do some consulting on another project in the lab that, to be honest, I hadn't really paid that much attention to up to that time. But, being between projects, I figured that a little consulting work might be refreshing, and give me time to find some interesting research topic to tackle next.
This is how I got started with Project Darkstar, and how I first encountered the world of large-scale on-line games and virtual worlds. And in fact the project did refresh me, and helped me to find some interesting research problems. But they were all involved with the project itself, which is why I am still a part (in fact, the technical lead) of the project to this day.
The problems faced by those developing on-line games or virtual worlds are all centered on scale. When a game or a world is released, there is no way of telling how popular it will be. You can (and game companies do) try to estimate the number of players, but being off by an order of magnitude or two is not uncommon. If you estimate too low, there are a lot of frustrated players who either find the game too slow or who can't play the game at all. If you estimate too high, you have a lot of (expensive) infrastructure around that isn't being used. What makes this all the more difficult is that current mechanisms for scaling in games and virtual worlds are based on cutting up the geography of the virtual environment and assigning different parts of the geography to different servers. And this cutting up is done as part of the design of the game. So if you get the scaling wrong, you need to change the design of the game, and the source code that implements that design. Not the kind of thing that can be done quickly.
Even if you are able to estimate the total numbers of player or users accurately, these environments are still open to tremendous fluctuations of load over time. Whether it is the discovery of some new feature in the game that causes everyone to crowd into the same region, or something as random as snow shutting down all of the schools on the east coast, the difference between normal use and peak use can be as much as a factor of 10. Which makes capacity planning all that much harder.
This sort of problem cries out for a solution using a bunch of machines working together in a way that allows load to be balanced at run-time rather than at compile time. So one of the goals of Project Darkstar is to provide a server infrastructure that would allow such balancing without requiring the game programmer to have to be involved in the workings of the distributed system. That by itself would be pretty challenging, but that is only the first of the project's goals.
The second goal is to fully exploit the multi-threaded capabilities available on modern chips without requiring the game programmer to become a concurrent programming wizard. With the possible exception of scientific super-computing, no area of programming has more aggressively ridden the Moore's law curve than games. But now the chip makers (including Sun) are changing the rules. Rather than making the clocks faster, we are producing chips with multiple cores. The argument is that chips are still getting faster, but the way that they are getting faster is that they can do multiple things at the same time. A particular sequence of instructions may not happen any faster, but you can run multiple streams of instructions through the chip at the same time, which means that you can do twice as much (or four times, or 16 times, depending on the number of cores) as you used to do in the same period of time.
All this is well and good, but it assumes that you can actually run parallel tasks to exploit the new chips. For some tasks (serving up web pages, for example) this is pretty easy. But for games, it is a bit more complex. Games (and virtual worlds) ought to be great candidates for multi-core and multi-threaded approaches, since most of what happens in a game (or virtual world) is independent of the other things that are happening, so the whole thing is embarrassingly parallel. The problem is that these applications are not entirely separate (sometimes players or members of the world do interact), and are very much worried about latency (rather than throughput). Even worse, there are few if any game programmers that understand how to write reliable concurrent programs. This doesn't make game programmers any different from almost all other programmers. But they don't get the benefits of multi-core machines without writing that kind of code.
So the overall goal of Project Darkstar is to produce an infrastructure that will allow game programmers to write their server code as though they were on single machine running on a single thread, while exploiting multiple threads and being able to share the load on lots of different machines. And that turns out to be a pretty interesting research topic.
The added bonus is that I've been introduced to the world (and culture) of games and virtual worlds. It is a very different world and a very different culture from the enterprise world that I had been used to, and I must say that it is a lot of fun. Games and virtual worlds are part of the entertainment business, or perhaps part of the education business, but definitely not the same as the usual enterprise business. These differences have led to a number of what I call Margaret Mead moments, when the differences in culture have made communication difficult and I found I needed to think in a new way. Which is a form of research in itself.
Hence this blog. I will be discussing games, virtual worlds, distributed computing, concurrent programming, Project Darkstar, and the community that we are beginning to build around all of these topics. It's a pretty broad set of topics, but fitting them together is at least fun, almost always interesting, and often instructive.