keith thompson

tales from the darkstar


« in the beginning | Main | Two wrongs don't... »
Thursday Sep 25, 2008

Case study: Improving login performance in Project Snowman

Project Snowman is a multiplayer capture-the-flag type game. The goals of the project were to showcase techniques for building a game server using Project Darkstar and to demonstrate the resulting benefits. Part of that demonstration was to show the snowman game server handling hundreds of client logins. To this end a client simulator was created as part of Project Snowman. The client simulator presents a simple GUI which allows the user to move a slider, selecting the number of clients to login to the server and play the game. The first implementation of the client simulator was designed to login clients as fast as possible. When we tested it against the game server the server was swamped after only a couple of hundred logins, performing far below our expectations. The following outlines the investigation into this initial poor login performance and the steps taken to improve it.

Before we dive into the investigation, lets look at the login requirements of the snowman game. Being a demonstration, the snowman game login is relatively simple. The server collects clients, or players, as they login, and once the required number of players-per-game is reached, a game is created and started. There is no consideration of which client joins which game, so login identity or order is unimportant. The simple login requirements naturally led to a fairly simple initial implementation. As a Darkstar programer is aware, when a client logs-in, the loggedIn() method is called on the application's AppListener class. The snowman's app listener class looked something like this: 

public class SnowmanServer implements Serializable, AppListener {

private Collection<SnowmanPlayer> waitingPlayers = // some unordered collection...

public ClientSessionListener loggedIn(ClientSession session) {
AppContext.getDataManager.markForUpdate(this);
SnowmanPlayer player =   new SnowmanPlayer(session));
waitingPlayers.add(player);
if (waitingPlayers.size() == NUM_PLAYERS_PER_GAME) {
startGame(waitingPlayers);
waitingPlayers.clear();
}
return player; // player is a ClientSessionListener
}
}

In this pseudo code, when a client logs-in a player object is created and then added to the list of waiting players. If there are enough players to start a game, startGame() is called, with the set of waiting players. The waiting set is cleared so that the process can repeat.

What the #^@%!& is going on? 

Though the login code above was straightforward, it was not immediately clear why it performed so poorly. One of the more useful tools for determining what the Darkstar server is doing is the profile listeners. These can be found in the package:

com.sun.sgs.impl.profile.listener

and Seth has blog entry that has more detail on using Darkstar's profiling sub-system. The particular profile listener that was useful in our case was SnapshotProfileListener. To enable this listener, add the following to the Darkstar server's properties:

com.sun.sgs.impl.kernel.profile.listeners=com.sun.sgs.impl.profile.listener.SnapshotProfileListener

This listener reports, on ten second intervals, the number of successful tasks, the number of tasks run, and the average size of the task queue. It reports this data on server port 43007 and can be easily monitored remotely using telnet:

telnet server.host 43007

will produce output looks something like this:

Snapshot[period=10000ms]:
 Threads=40  Tasks=9618/9629
 AverageQueueSize=1.90 tasks  

 

The report above is showing very low contention, 9,618 tasks succeeded out of 9,629 and the average queue size during that time was less than two tasks. During our simulated login "storm", the profiling information showed that the average queue size rapidly increased, into the hundreds, while the number of successful tasks was much lower than the number of tasks run. It is not unusual to see tasks failing, aborting and rescheduling tasks is how Darkstar manages data contention between competing tasks. However, the number of successful tasks was a fraction of the tasks started, which means that something was highly contentions resulting in very little progress by the server. One way we verified this was to add a configurable delay to the client simulator. When client logins were spaced 200 ms apart the number of failed tasks was greatly reduced as well as the average queue length. Unfortunately, five logins per second was not very exciting.

Examining the loggedIn() method above it is clear that each client login modifies the SnowmanServer object (the app listener) when it changes the contents of waitingPlayers. Since there is only one app listener in the server, and only one task can modify the same object at a time it makes the app listener a single choke point for all logins. Creating a new player and adding it to the waiting set is trivial, and even though there is contention on the app listener object, these login tasks should finish quickly, and for the most part keep out of each other's way. Starting a new game is a different story. To start a new game the server must create a game object, two flag objects, and some number of robot player objects. Each of which are managed objects. In addition, a channel is created and each "real" player (non-robot) joins that channel. While this lengthy game creation is taking place, no other login tasks  can succeed because the app listener is being modified. Even worse, it is highly likely that occasionally the game creation will fail, throwing away all that was done.

Short tasks are good tasks

Since logging-in without creating a game was quick, it seemed desirable to make all logins that speedy. To achieve this, the game creation was moved to a separate task:

public ClientSessionListener loggedIn(ClientSession session) {

AppContext.getDataManager.markForUpdate(this);
SnowmanPlayer player =   new SnowmanPlayer(session));
waiting.add(player);
if (waiting.size() == NUM_PLAYERS_PER_GAME) {
AppContext.getTaskManager.scheduleTask(new StartGameTask(waiting));
waiting.clear();
}
return player; // player is also ClientSessionListener
}

private class StartGameTask implements Task {

Collection<SnowmanPlayer> waitingPlayers = // some unordered collection...

StartGameTask(Collection<SnowmanPlayer> waiting) {
waitingPlayers.addAll(waiting);
}

public run() {
statrtGame(waitingPlayers);
}
} 

By removing the game creation from the login, we shortened the login task in the case that a game is ready to start. Since there is no contention between game creation tasks, those are unlikely to abort. This simple fix was quite effective in reducing task failures, verified by the profiling information, and allowed a much higher rate of client logins. Note that the collisions between the login tasks was reduced, the contention remained unchanged. The next step was an effort to actually reduce the contention.

Lets get clever 

One of the ideas that has been kicking around the group to deal with the multiple writer problem is to use a series of independent queues, with multiple writers feeding the tops of the queues, and another set of readers emptying the queues. The theory is that if the queues are independent, contention would only occur when the same queue was accessed for write. If there were many queues the chance that two writers hit the same queue at the same time could be small. Mapping this scheme into the snowman server code, we added a set of queues to the app listener object and when a client logged-in, the newly created player was placed into one of the queues. On the back side, a (new) task would read off the queues until it found enough players to start a game. Implementing this was not as straightforward as it sounds. In the app listener you could not just add an array of queues and expect any benefit. This is because changing the state of any one of the queues effectively changes the state of the app listener. We would sill have the single point of contention. Instead, the queues needed to be wrapped in a ManagedObject. This allows the app listener to keep an array of ManagedReferences to the queues. Now the app listener is immutable (in the context of the login task) and no longer a point of write contention (Managed objects, such as the app listener, accessed only for reading can be shared). Ignoring for now the task reading the end of the queues, here is the new app listener and queue objects:

public class SnowmanServer implements Serializable, AppListener {
private ManagedReference<Deque<ManagedReference<SnowmanPlayer>>>[] waitingDeques;

public void initialize(Properties arg0) {
this.waitingDeques = new ManagedReference[NUMDEQUES];
for(int i = 0; i < waitingDeques.length; i++) {
Deque<ManagedReference<SnowmanPlayer>> deque = new ManagedDeque<ManagedReference<SnowmanPlayer>>();
waitingDeques[i] = AppContext.getDataManager().createReference(deque);
}
AppContext.getTaskManager().scheduleTask(new MatchmakerTask(waitingDeques));
}

public ClientSessionListener loggedIn(ClientSession session) {
SnowmanPlayer player =  new SnowmanPlayer(session);
BigInteger id = player.getSnowmanPlayerRef().getId(); // get a ManagedReference to the player
BigInteger index = id.mod(BigInteger.valueOf((long)NUMDEQUES));

waitingDeques[index.intValue()].get().add(player.getSnowmanPlayerRef());
return player;
}

class ManagedDeque<E> extends LinkedList<E> implements ManagedObject {}
}
}

There are a couple of important things to note here. First is the use an array of ManagedReferences to keep track of the queues in the app listener. Since the queues are ManagedObjects their persistence is handled by Darkstar so the app listener does not need to maintain references to the queues directly. So a change to the queue does not translate into a change in the app listener object. A potential trap is in selecting a queue to place the player in. A temptation might be to keep a index in order to insert into the queues round-robin or to keep a Random object to distribute the players randomly. The problem would be that an int index or the Random object's state would change, and therefore the app listener would no longer be read only. Luckily, in this example, the player's (ManagedReference) ID can be used to create an index.

On the other end of the queue, the MatchmakerTask can be implemented several different ways. Here is a fairly straightforward approach:

public class MatchmakerTask implements Task, Serializable
{

private List<ManagedReference<SnowmanPlayer>> waitingPlayers;
private ManagedReference<Deque<ManagedReference<SnowmanPlayer>>>[] waitingDeques;

public MatchmakerTask(ManagedReference<Deque<ManagedReference<SnowmanPlayer>>>[] waitingDeques) {
this.waitingPlayers = new ArrayList<ManagedReference<SnowmanPlayer>>();
this.waitingDeques = waitingDeques;
}

public void run() throws Exception {
boolean playersFound = false;
for(int i = 0; i < waitingDeques.length; i++) {
ManagedReference<SnowmanPlayer> nextPlayer = waitingDeques[i].get().poll();
if(nextPlayer != null) {
playersFound = true;
waitingPlayers.add(nextPlayer);
}
if(waitingPlayers.size() == NUM_PLAYERS_PER_GAME) {
startGame(waitingPlayers);
break;
}
}

// if no players are found in the queue during this iteration
// schedule a delay for the next polling cycle
// otherwise, schedule the next cycle to occur immediately
if(playersFound)
AppContext.getTaskManager().scheduleTask(this);
else
AppContext.getTaskManager().scheduleTask(this, POLLING_INTERVAL);
}
}
} 

Basically, the task loops once through the set of queues collecting players. If enough for a game are found, a game is started. Before the task ends, it schedules itself to be run again. One could imagine changes in this task, like creating the game in its own task or polling the queues in a more round-robin fashion, but the important accomplishment here is the reduction in contention with the client login task. The code improvements to this point have left only these two points of contention: 1) between login tasks when inserting to the same queue, and 2) between login task and the match maker task when inserting and removing, respectively, from the same queue.

Are we done squeezing?

Though the changes so far have achieved a more than satisfactory login rate, there is one more improvement we made. Darkstar provides a utility data structure that we could use as a drop-in replacement for our ManagedDeque. The class com.sun.sgs.app.util.ScalableDeque is an implementation of Deque that provides concurrent, non-contentious, inserts and removals (as long as there is more than one element in the queue). This further reduces the contention between the client login task and the matchmaker task during a login storm.

And there was great rejoicing... 

Yea team!

Project Snowman was created by a bunch of folks, and to give credit where credit is due, Tim Blackman, long time Project Darkstar dude,  had some time ago mentioned to me the idea of using multiple queues to reduce contention. Owen Kellett, who like myself is a bit newer to Darkstar, implemented the multiple queue mechanism in the snowman server. And David Jurgens, our perennial intern, implemented ScalableDeque. Owen has also posted a blog about Project Snowman that is worth reading. I know Chuck Norris would want you to.

Comments:

very helpful, thanks

Posted by 74.99.86.109 on November 25, 2009 at 11:37 PM EST #

Post a Comment:
  • HTML Syntax: NOT allowed

Today's Page Hits: 14