Synchronicity
Random thoughts from a random engineer
Archives
« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML
Search

Links
Referrers

Today's Page Hits: 18

All | General | Hacks
« Introduction | Main | Flows observability »
20060509 Tuesday May 09, 2006
Network Stack Simulator
I've had a fascination with virtual machines/emulators for many years. Ages ago, when I took my first OS class, I had the fortunate opportunity of playing with and building various pieces of an OS emulator. That class piqued my interest in operating systems and greatly influenced my career direction. Around the spring of 2000, I had to come up with a project idea for one of my networking classes. I wanted to attempt something significant that time. I decided on implementing a network stack simulator.

The goal of the project was to simulate communication between multiple virtual kernels, which may reside within the same physical machine or spread across the network. These virtual kernels were not full-fledged virtual machines -- they did not simulate hardware; they did not have the usual OS functionality (scheduling, VM..etc). All they had was a network stack. The virtual kernels were passive entities that had to be driven by commands (in the form of system calls) sent to them from apps running on the host OS. These host OS apps were linked to a special library that would direct certain sockets API syscalls to a virtual kernel instead of the underlying OS. This library communicates with a virtual kernel via a pseudo RPC mechanism; This allows an app linked to this library to, for example, drive a virtual kernel running on a different physical machine. The network stack was based on the 4.4BSD codebase. This choice was mainly due to the availability of documentation (TCP/IP Illustrated Vol. 2) and the simplicity of the codebase relative to other modern OSes.

A number of technical obstacles had to be overcome in order to realize the above architecture. Not the least of which was determining what exactly could be borrowed from 4.4BSD and what had to be reimplemented in userland. This was a painstaking process involving studying each and every file related to the network stack and identifying the their dependencies on the rest of the 4.4BSD kernel. In the end, it turned out that the bulk of the code could be reused without modifications. The crucial pieces that had to reimplemented were:

Dynamic memory allocation - BSD made use of a data structure called mbufs for holding packet data. There was a wide variety macros for manipulating mbufs; some of which assumed the mbuf to be aligned in a certain way. some of which assumed the mbuf was allocated from a large contiguous region carved into page-sized chunks. These semantics were all emulated, albeit non-trivially, using vanilla malloc().

Interrupt priority manipulation - The functions splxxx() (where xxx was a priority level) were used throughout the 4.4BSD kernel for providing mutual exclusion to data structures. The idea was that, by raising the interrupt priority level temporarily, a thread processing certain data structures could prevent other entities that share the same data structures from preempting it. This was emulated using a global mutex and a counter.

Timers - 4.4BSD did have its implementation of callouts; but I chose not to use it. Instead, a dedicated thread was used to invoke a number of hardcoded callbacks at periodic intervals. This was in fact sufficient for 4.4BSD's network stack. This would likely not work for other OSes due to their heavier dependence on timers.

NIC Driver - only one driver was ported to userland and it was stripped of all of its hardware dependent code. The driver was made to send/receive ethernet frames to/from a configurable IP multicast group. The use of multicast obviated the need for implementing a separate mechanism for steering packets between virtual kernels.

The above pieces and the portable BSD stack code comprised about 80% of the virtual kernel implementation. The other 20% consisted of the syscall layer and the logic for processing syscalls received from remote clients. For the syscall layer, the bulk of the work was related to isolating the syscall code and severing dependencies to process-context related routines and data structures. The processing of remote syscalls was handled by a dedicated thread called the proxy thread. When a remote app attaches to a virtual kernel, a proxy thread and its context would be created. The attach occurs at the first invocation of a sockets syscall and was transparent to the app. The proxy thread's operation could be described as a loop consisting of the tasks: receive syscall message, decode message, invoke syscall with decoded arguments, send syscall result. When the remote app terminates, the associated connection teardown would cause the virtual kernel to implicitly invoke the exit syscall, which would cleanup the proxy thread and any data structures (file descriptors, sockets..etc) created during its lifetime.

The implementation contained about 30000 lines of code from 4.4BSD; most of which were unmodified or only slightly modified. I wrote around 3000 lines myself for tying all the pieces together. This work took about one month to complete and it was one of the most rewarding projects I've worked on.

May 09 2006, 07:37:17 PM PDT Permalink Comments [1]

Comments:

This is quite interesting, and looks similar to Alpine.

Posted by Yu Xiangning on May 10, 2006 at 02:19 AM PDT #

Post a Comment:

Comments are closed for this entry.