about the web, software etc. Recursion, n.: see 'Recursion'

Monday Jun 23, 2008

I consider blogs to be "work in progress", but this entry seems to be even more so - and since it's also describing work in progress, somehow recursive :-)

One of the pieces still missing from (Open)Solaris is the capability to forward IP incoming packets to a set of (more than one) hosts from within the kernel, ie. to do load balancing.

The main benefit of an in-kernel load balancer vs. a userland-based one is the much reduced traffic of networking data ("payload") through the kernel/userland boundary. Traffic across this boundary is known to be expensive, therefore the fact that we incur less of it means that - all other things being equal - we can achieve better performance, both wrt connections per second and wrt throughput.

To address this, we recently created a prototype with very basic load balancing capabilities that we're hoping to put out on opensolaris.org once all the formalities (read: legal stuff) have been completed. You may have seen Sangeeta's email proposing this project for opensolaris: http://www.opensolaris.org/jive/thread.jspa?threadID=64639&tstart=0. We're also going to be soliciting input from people who would like to actively test this prototype.

We realise that a full product offering around a load balancer is unlikely to be achievable within the time it would make sense for us to do so, from the point of view of the addressable market, so we're going to concentrate on providing the infrastructure necessary for developers and OEMs to optimally exploit this capability we're introducing. (Plans on *when* this is going to happen, and what exactly is going to be in which delivery aren't quite finalised, so please bear with us ...)

Even before we release the code, I think I can present a short overview of what the prototype consists of. We have:
- the in-kernel forwarding engine ("ilb" = internal load balancer, which we also use as name for the whole project ...)
- the command-line utility ("lbadm").
Things like redundancy (ie. failover), backend server healthcheck etc. were not implemented for the POC.

My task was and is to define the requirements for, and then design and implement the CLI. While this sounds rather straightforward, the devil's in the detail, as usual. Here's some of the questions being asked of CLI as well as the CLI/kernel module combo, as well as their answers:

  1. what does the CLI do? (that's the obvious one ;-)
    A: Administrate all ILB rules and display associated information.
  2.  what is the "unit of currency" the ilb handles?
    A: (as indicated above) a rule. A rule consists of:
      a. a set of conditions to be met by the incoming packet
      b. the destination for a packet that matches the above conditions
      c. additional information for the load balancer.
  3.  is there precedent in Solaris for similiar functionality (ie, do we want to look at dladm or perhaps zfs)?
    A: the model we chose to follow is flowadm (coming with the crossbow project, not yet in Solaris) (see http://dlc.sun.com/osol/netvirt/downloads/20080310/flowadm.1m.txt), the basic structure is

        command subcommand [options] [object]

    and a subcommand always is of the form "verb-object" eg "show-flow" or, in the case of lbadm, "create-rule". The object in our case is the rule.
  4. how do we structure the CLI?
    A: for the prototype, the CLI was one monolithic, stand-alone binary.
  5. how does the CLI talk to the kernel?
    A: for communication between CLI and kernel, we created a data structure to contain all the relevant information and defined an ioctl for passing information to and fro.
  6. what about statistics?
    A: currently, the kernel maintains a basic set of kernel statistics (kstats); some of them for the whole module, some on a per-rule basis and some on a per-backend server basis. For the prototype, I created a shell script to read the data via kstat(1) and perform some mangling on them to produce vmstat(1)-like output.


some of the additions/modifications which will be implemented by this project:

  • the CLI functionality will be split into a library and a CLI consuming the library. The purpose of this is to enable 3rd parties to make use of this infrastructure.
  • integration of statistics display into lbadm.
  • addition of failover functionality using VRRP.
  • add configuration persistence and integrate with SMF.
  • integration with ipnat configuration1.
  • implement some form of check for the "health" of backend servers
  • enable management of several hosts as single entities (host pools)
  • connection "stickiness"


1) so far, I've not explained one major aspect: load balancing methods and topology. Topologies known in the industry are DSR (direct server return - the load balancer never sees return traffic, or just forwards it back without any modification) and NAT (half vs. full); known methods are round-robin or various forms of connection weighing. ipfilter, which has been in Solaris for quite some time and has been available as an opensource project for much longer, has some NAT functionality. For the prototype, we implemented DSR functionality seperate from ipfilter's nat functionality, and in no way integrated the administration of ipnat with lbadm



Comments:

Post a Comment:
  • HTML Syntax: NOT allowed