Miracle #4
All that is left is the actual moving of the tasks themselves. And how hard could that be?
At one level, not that hard at all. Since all data that is used in a task needs to be retrieved from the data store in each task, it doesn't matter where the task executes from the point of view of accessing data. Since we require that all server tasks are written in Java, we can move the tasks themselves since all Java code works the same from machine to machine. In fact, this is the reason that we require Java. This also means that our real requirement has nothing to do with the syntax of the language in which you write your source code. Our real requirement is that your tasks have as their object format a set of valid Java bytecodes that make calls to Java libraries. If you have another language that gives you such bytecodes in which you would rather write your tasks, have at it. It will work with Darkstar.
But not every part of moving tasks is so easy. In particular, the tasks in Project Darkstar are assumed to do some (a lot?) of communication, both through the session (that is, the connection between the game server and the game client) and over channels (that is, the publish/subscribe-like communication mechanism that allows a message to be sent to a set of clients). When the tasks associated with player of a group of players gets moved, we need to move the sessions and channels that are associated with that player or group of players. And this turns out to be not so easy at all. In fact, it turns out to be a research problem all of its own, and the fourth miracle that needs to be performed for us to get to a scalable multi-node implementation.
Like the previous miracles, this one is not being attempted by me. Instead, Ann Wollrath is doing the design and the work on this one. Ann has considerable experience in making the network (nearly) disappear; she wrote the client session and channel services and is the inventor, designer, and implementor of Java RMI. So once again I'm reporting on the work of one of the core Darkstar team members.
The crux of the problem in moving client sessions and channel endpoints is coordinating between the various entities. There is the node mapping service, which initiates the move. There is the client session service and the channel service on both the node on which the tasks are currently running, and on which the tasks will be moved. Finally, there is the client itself (or at least the Darkstar client library) that needs to coordinate with the various servers to move its target from the old server node to the new server node. And while all this is going on, message continue to be sent, that need to be queued or forwarded or processed.
The way things are planned now, the whole dance starts when the node mapping service tells the client session service on a node (through a listener registered by those services) that it should prepare for the move of an identity. This service will then prepare for the move, part of which is to coordinate with the channel service, and part of which is to tell the client to prepare for such a move. Once a client has been told to prepare for the move, the client should stop sending any non-move related messages to the server. But between the time that the message to prepare is sent to the client and the time that the client replies that it is prepared, messages sent from the client need to be queued.
Once the channel service is prepared, it tells the client session service which, in turn, sends a message back to the node mapping service when it is ready saying both are prepared. The node mapping service then updates its mapping, and the move can commence back on the starting node. This requires that the client be told to open a session on the new node, and once this is done that any messages that have been queued be delivered to the client by the new node. The client will use a reconnect key that was supplied by the server to connect to the new node.
Channels are a bit more complex. Channels themselves don't move, but once a client session is moved all of the channels to which that client was connected need to be updated so that channel messages will get delivered to the session (which is now running on a different node). The old node will notify the channel coordinator (as a return value of a join, leave, or send message) that the session has relocated, and the coordinator will redirect the message to the new node. Joins and leaves on the channel need to be delivered reliably, so they may need to be queued and delivered later, after the move is complete.
Since a large part of this dance is taking place between the game server nodes and the user's client machine, we will need to expand the set of messages that can be sent between the client and the server. So far, it appears that we will need
- a message from the old node to the client telling the client to prepare for a move to a new node;
- a message from the client to the old node saying that the client is prepared;
- a message from the old node to the client saying that the client needs to connect to the new node at a particular port with a reconnection key;
- a message from the client to the new nod, requesting relocation and supplying a relocation key;
- a message from the new node to the client indicating that the relocation request has succeeded;
- a message from the new node to the client saying that the relocation request has failed.
There are lots of interesting error possibilities in all this. Depending on when the error occurs, the system can simply drop messages (if they are unreliable) or, at the very worst, disconnect the client and let the client connect again, which would establish a new client session. This last is a resort only when things have gone really wrong, but sometimes (like when a move command comes in during a move, which should never happen, but if it did you couldn't tell where the starting point or end points were) it is the only thing that makes sense.
The other real challenge in all of this is testing to insure that the implementations are correct. Since most of the more interesting errors involve race conditions, we need to find ways of stopping each side at just the right time, sending the messages on the other side that can lead to some trouble, and then starting things up again. There is already at least one case where the testing code has had to resort to wrapping an existing wrapper class, along with other sorts of Java tricks that should not be described in public. This difficulty shouldn't come as a surprise, since testing concurrent systems is a known hard problem, and we've added in the need to test such a system and thrown in a distributed aspect, as well. The fact that we can do any testing at all is pretty surprising, and a testament to the talent of the people involved.
Hi Jim
Are these experiments being checked into the public repository, or are they private until "complete"? Would love to take a look at what Ann is working on, sounds absolutely awesome.
Thanks
Patrick
Posted by Patrick on July 21, 2009 at 09:31 AM PDT #
Like all our work, this is done in the public repositories... we don't have any private repositories (except for the copies of the workspace on our local disk). So you can see what is going on as it happens...
Posted by James Waldo on July 21, 2009 at 10:18 AM PDT #
I didn't know that you have in your team the inventor of RMI. This API inspired the early versions of the network library on which I am working now: "Jnag". Next time you see her, tell her thx from me :-)
Regards, Vincent Cantin.
Posted by Vincent Cantin on October 08, 2009 at 08:56 AM PDT #