Thursday May 08, 2008

If you read what the president of Sun says, you would know that there is a big push inside of Sun to move to open source. Recently, I was able to spend some time with some of the engineers from MySQL to talk with them about their experiences doing open source software. And I realized that the advantages of doing it whole hog far outweigh any risks that might be avoided if we just dipped our toes into the open source world. So we at Project Darkstar decided that we would take the plunge, all the way, and try to open up what we are doing to the community.

One of the effects of this is that we are going to release the software more often. Some releases will be tested, stable, and supported; these will be like the releases we have been putting out the past year or so. But we also plan on putting together experimental releases, which are for the bleeding edge of the community. These may happen weekly, and will show the current state of the trunk (which may not always be pretty, but we hope will always be interesting). The big change here is that you (the community) don't need to be protected from our code; you can decide for yourself how adventurous you want to be. Better still, you can all see the progress we are making, or see where we aren't making progress (and maybe pitch in).

In this spirit, we are putting out the first release of Project Darkstar that supports running a game on multiple machines. This is a milestone that we have been working on for some time, and it is nice to finally get there. The best part of this release is the service interfaces, which have changed some since the last, single-machine release. The interfaces for the various Darkstar services are, we believe, pretty stable now, and should continue unchanged while we continue to work on the multi-machine implementation. But I'll be honest-- the performance is not yet what we would like. When run on a single machine, the release will run at about the speed of the current, single-machine release. But when run on multiple machines, the overall performance goes down. Add more machines, and the overall performance gets worse.

We are releasing this now because we want to get this out to the community as quickly as possible. One reason is to let you all see what the new APIs look like, so we can get your opinions and feedback, and you can start using them. But we'd also like to get you folks helping us to find problems in this code, or produce tests that we can use to find the problems. In other words, we want to start moving this to a real open-source project rather than a Sun Labs project that has an open-source license. And this sort of change is, we think, even more important than the multi-machine capable implementation. It is a change in the way we are trying to think about the Project Darkstar community.

Releasing this version is only the first step. We are also planning to move our code repository out into the open, so that the rest of you can see what we are doing on a day-by-day basis. And we are not just moving the trunk of our development out into the open-- all of the branches being worked on by the various developers will be available to the community, as well. We assume that everyone in the community will have the good sense to distinguish between the code we are experimenting with in our branches and the code we check into the trunk. But by making the branches available, you can see how we develop the code and get a sense of the directions we are taking. If you want to be even more bleeding edge than our unstable builds, you can always build the trunk (one of our internal rules is that the trunk should always build; we hope to keep that discipline as we move to the open).

We are currently looking in to where we can put this repository (our current best guess is somewhere in the Java.net area) and what revision control system we will use (almost certainly Subversion). We also want to see how we can move our bug-tracking and wiki software into a public area so that all of our documents, bugs, and status reports can be open to the community. It will probably take a week or two for us to get all of this sorted out, but it is coming, and when it becomes available we will let you all know where it is.

We also want to move our design discussions out into the open so that they are both visible and open to the wider community. Towards this end, we have decided to use the mailing lists that already exist, in particular the dev@games-darkstar.dev.java.net list. All of the core Project Darkstar developers have signed up for this list; if you want to see what we are talking about and perhaps join in, sign up (you can ask for either a daily digest or when-posted delivery).

We have also decided to not get bogged down in discussions of process, which in a previous incarnation slowed the momentum of the community. In particular, we decided not to discuss what would be needed for others to become committers to the core codebase. Instead, we will just follow the open-source practice of starting with a set of committers (the current team), accepting and reviewing code contributions from the larger community, and when someone shows that they are capable and committed to the project invite that person in as a committer. Maybe sometime in the future we will need to have a more formal process, but not right now. In communities, as in code, often less is more.

What we hope to get out of this is an active community that helps us expand the Darkstar offerings in interesting ways. There is more to do in this space than the core team can ever hope to do. There are tools to build, and benchmarks to write, and code that can help debug and profile and tune. Our hope is that we can get others writing some of these, or helping to develop the core, or helping us to think about approaches that we haven't considered. The real hope is that by expanding the set of people involved, we can build a better software offering.

No doubt there is more that I've forgotten to mention. But I'm excited at the thought of becoming more open, and trying to get more of you involved. We have a good team working on Darkstar at present, but it is small. Enlarging it this way seems to be the best thing we can do for the technology. And it should also be a whole lot of fun.

Monday Mar 31, 2008

We have been getting the multi-node version of Darkstar ready for the 1.0 release. In the process of trying to characterize the performance of the system, we have been running into some interesting problems having to do with observing those parts of the system that we have purposely tried to hide...[Read More]

Friday Feb 29, 2008

"The time has come," the Walrus said, "To talk of many things..."
-- Lewis Carroll

This is the first posting for a blog about computer games, virtual worlds, distributed systems, and concurrent programming. If this seems like a strange combination of subjects, then please read on; I'd like to convince you that the combination is not only natural but necessary.

First, a bit about me. I've been doing distributed computing and object-oriented programming for some time. I started all this at Apollo Computer, where I worked on text libraries (shipping one of the first commercial C++ class libraries not done at Bell Labs) and distributed systems. When Apollo was acquired by Hewlett-Packard I found myself an HP employee. While there, I led a group that implemented a system that became the base of the first CORBA specification.

About 16 years ago, I joined Sun (and more specifically Sun Labs), where I continued working in the area of distributed systems. I did some kibitzing during the design and implementation of Java Remote Method Invocation (RMI), which got me and the rest of my group thrown out of the labs for being overly relevant, and then led the group that did the Jini Networking Technology. Like I say, I've been doing distributed systems for some time.

I came back to the labs about four years ago, looking around for interesting work in distributed systems. I did some work in systems for large-scale medical sensing that, while interesting, couldn't have been deployed in the current environment for medical care in the United States. About the time that I was coming to this realization, I was asked to do some consulting on another project in the lab that, to be honest, I hadn't really paid that much attention to up to that time. But, being between projects, I figured that a little consulting work might be refreshing, and give me time to find some interesting research topic to tackle next.

This is how I got started with Project Darkstar, and how I first encountered the world of large-scale on-line games and virtual worlds. And in fact the project did refresh me, and helped me to find some interesting research problems. But they were all involved with the project itself, which is why I am still a part (in fact, the technical lead) of the project to this day.

The problems faced by those developing on-line games or virtual worlds are all centered on scale. When a game or a world is released, there is no way of telling how popular it will be. You can (and game companies do) try to estimate the number of players, but being off by an order of magnitude or two is not uncommon. If you estimate too low, there are a lot of frustrated players who either find the game too slow or who can't play the game at all. If you estimate too high, you have a lot of (expensive) infrastructure around that isn't being used. What makes this all the more difficult is that current mechanisms for scaling in games and virtual worlds are based on cutting up the geography of the virtual environment and assigning different parts of the geography to different servers. And this cutting up is done as part of the design of the game. So if you get the scaling wrong, you need to change the design of the game, and the source code that implements that design. Not the kind of thing that can be done quickly.

Even if you are able to estimate the total numbers of player or users accurately, these environments are still open to tremendous fluctuations of load over time. Whether it is the discovery of some new feature in the game that causes everyone to crowd into the same region, or something as random as snow shutting down all of the schools on the east coast, the difference between normal use and peak use can be as much as a factor of 10. Which makes capacity planning all that much harder.

This sort of problem cries out for a solution using a bunch of machines working together in a way that allows load to be balanced at run-time rather than at compile time. So one of the goals of Project Darkstar is to provide a server infrastructure that would allow such balancing without requiring the game programmer to have to be involved in the workings of the distributed system. That by itself would be pretty challenging, but that is only the first of the project's goals.

The second goal is to fully exploit the multi-threaded capabilities available on modern chips without requiring the game programmer to become a concurrent programming wizard. With the possible exception of scientific super-computing, no area of programming has more aggressively ridden the Moore's law curve than games. But now the chip makers (including Sun) are changing the rules. Rather than making the clocks faster, we are producing chips with multiple cores. The argument is that chips are still getting faster, but the way that they are getting faster is that they can do multiple things at the same time. A particular sequence of instructions may not happen any faster, but you can run multiple streams of instructions through the chip at the same time, which means that you can do twice as much (or four times, or 16 times, depending on the number of cores) as you used to do in the same period of time.

All this is well and good, but it assumes that you can actually run parallel tasks to exploit the new chips. For some tasks (serving up web pages, for example) this is pretty easy. But for games, it is a bit more complex. Games (and virtual worlds) ought to be great candidates for multi-core and multi-threaded approaches, since most of what happens in a game (or virtual world) is independent of the other things that are happening, so the whole thing is embarrassingly parallel. The problem is that these applications are not entirely separate (sometimes players or members of the world do interact), and are very much worried about latency (rather than throughput). Even worse, there are few if any game programmers that understand how to write reliable concurrent programs. This doesn't make game programmers any different from almost all other programmers. But they don't get the benefits of multi-core machines without writing that kind of code.

So the overall goal of Project Darkstar is to produce an infrastructure that will allow game programmers to write their server code as though they were on single machine running on a single thread, while exploiting multiple threads and being able to share the load on lots of different machines. And that turns out to be a pretty interesting research topic.

The added bonus is that I've been introduced to the world (and culture) of games and virtual worlds. It is a very different world and a very different culture from the enterprise world that I had been used to, and I must say that it is a lot of fun. Games and virtual worlds are part of the entertainment business, or perhaps part of the education business, but definitely not the same as the usual enterprise business. These differences have led to a number of what I call Margaret Mead moments, when the differences in culture have made communication difficult and I found I needed to think in a new way. Which is a form of research in itself.

Hence this blog. I will be discussing games, virtual worlds, distributed computing, concurrent programming, Project Darkstar, and the community that we are beginning to build around all of these topics. It's a pretty broad set of topics, but fitting them together is at least fun, almost always interesting, and often instructive.

This blog copyright 2009 by Jim Waldo