about the web, software etc. Recursion, n.: see 'Recursion'

Wednesday Aug 20, 2008

Last weekend, I managed to make good on a promise I made to myself long ago (and got my wife to agree to ;-): I took part in a photography workshop. The instructor was Joe Decker (read more about him here). The whole group consisted of six people, including Joe, so we got good advice whenever we asked for it.

We started off in a parking lot just north of Golden Gate Bridge, and worked our way up Conzelman Road and all the way to Fort Barry. The weather was not quite unexpectedly misty, cold and generally miserable, which was ideal for shooting the battlements from WWII which now stand in disrepair. Not surprisingly, we weren't the only people around with photo gear.

After having seen and shot enough of ruins to last us quite some time, we proceeded to the beach next to Fort Cronkhite; the fog had increased in density, and although we were having a good time in general, it was getting a little damp, esp. close to the ocean with the wind blowing spray and mist right into our faces. Here's a picture which I believe captures the atmosphere of the day:lonly tripods

When and if I find time to properly process the pictures I took, I'll post them on one of my photo pages and drop a link here.

Sunday Jul 13, 2008

One of the requirements that need to be fulfilled by an offering in the load balancer space is he ability to periodically check the health of its' back end servers.

The health of a back end server can be defined in several ways:

  1. the ability to respond to ping
  2. the ability to perform a tcp handshake
  3. the ability for a server application (ie. http server) to respond with meaningful data to a request
  4. (your favourite method here ...)

in addition to the check, there must be an ability to report either the health status of a given server, or to report the change of status for a given server when this status changes (ie, when a server dies or comes back to life).

We examined a few open-source network monitoring tools (I think nagios was among those tools, as well as OpenNMS ... I wasn't too deeply involved in this part, so I don't know the details), but came to the conclusion that none was suited well enough for our purposes, so we decided we'd need to build our own. We still need to finalise the design, but I think I can give a basic outline of what will be required for a health check subsystem within the ILB project, as well as some of the requirements on other parts of the ilb project to accomodate HC:

  • HC will (initially) be private to ilb. 
  • we plan to implement this as a daemon, ie. hcd (health check daemon).
  • lbadm, the tool to administer ilb, will also be the only means to administer hcd.
  • hcd will not maintain any persistent state.
  • for this release, all back end servers for a lb rule will be checked by the same health check.
  • as a consequence of the above, since a server can be part of more than one rule, it must be possible to perform several checks on the same server.
  • ilb will be able to distinguish between permanent removal of a back end server (eg. by an administrator) and temporary removal of a back end server (eg. when it is unreachable over the network) from a rule.
  • hcd will implement some kind of capability to log the fact that a server has died (eg. using syslog).
I drew a crude picture of what I believe represents how hcd fits into the rest of the ilb infrastructure (so far) - I didn't spend much time on it, nor am I the born artist with electronic paint tools, so I'll ask you to excuse the craftsmanship and concentrate on the content ;-)

Tuesday Jul 01, 2008

I just saw this - I'm glad to see this happen, and I'm looking forward to more student participation @Sun!

Monday Jun 23, 2008

I consider blogs to be "work in progress", but this entry seems to be even more so - and since it's also describing work in progress, somehow recursive :-)

One of the pieces still missing from (Open)Solaris is the capability to forward IP incoming packets to a set of (more than one) hosts from within the kernel, ie. to do load balancing.

The main benefit of an in-kernel load balancer vs. a userland-based one is the much reduced traffic of networking data ("payload") through the kernel/userland boundary. Traffic across this boundary is known to be expensive, therefore the fact that we incur less of it means that - all other things being equal - we can achieve better performance, both wrt connections per second and wrt throughput.

To address this, we recently created a prototype with very basic load balancing capabilities that we're hoping to put out on opensolaris.org once all the formalities (read: legal stuff) have been completed. You may have seen Sangeeta's email proposing this project for opensolaris: http://www.opensolaris.org/jive/thread.jspa?threadID=64639&tstart=0. We're also going to be soliciting input from people who would like to actively test this prototype.

We realise that a full product offering around a load balancer is unlikely to be achievable within the time it would make sense for us to do so, from the point of view of the addressable market, so we're going to concentrate on providing the infrastructure necessary for developers and OEMs to optimally exploit this capability we're introducing. (Plans on *when* this is going to happen, and what exactly is going to be in which delivery aren't quite finalised, so please bear with us ...)

Even before we release the code, I think I can present a short overview of what the prototype consists of. We have:
- the in-kernel forwarding engine ("ilb" = internal load balancer, which we also use as name for the whole project ...)
- the command-line utility ("lbadm").
Things like redundancy (ie. failover), backend server healthcheck etc. were not implemented for the POC.

My task was and is to define the requirements for, and then design and implement the CLI. While this sounds rather straightforward, the devil's in the detail, as usual. Here's some of the questions being asked of CLI as well as the CLI/kernel module combo, as well as their answers:

  1. what does the CLI do? (that's the obvious one ;-)
    A: Administrate all ILB rules and display associated information.
  2.  what is the "unit of currency" the ilb handles?
    A: (as indicated above) a rule. A rule consists of:
      a. a set of conditions to be met by the incoming packet
      b. the destination for a packet that matches the above conditions
      c. additional information for the load balancer.
  3.  is there precedent in Solaris for similiar functionality (ie, do we want to look at dladm or perhaps zfs)?
    A: the model we chose to follow is flowadm (coming with the crossbow project, not yet in Solaris) (see http://dlc.sun.com/osol/netvirt/downloads/20080310/flowadm.1m.txt), the basic structure is

        command subcommand [options] [object]

    and a subcommand always is of the form "verb-object" eg "show-flow" or, in the case of lbadm, "create-rule". The object in our case is the rule.
  4. how do we structure the CLI?
    A: for the prototype, the CLI was one monolithic, stand-alone binary.
  5. how does the CLI talk to the kernel?
    A: for communication between CLI and kernel, we created a data structure to contain all the relevant information and defined an ioctl for passing information to and fro.
  6. what about statistics?
    A: currently, the kernel maintains a basic set of kernel statistics (kstats); some of them for the whole module, some on a per-rule basis and some on a per-backend server basis. For the prototype, I created a shell script to read the data via kstat(1) and perform some mangling on them to produce vmstat(1)-like output.


some of the additions/modifications which will be implemented by this project:

  • the CLI functionality will be split into a library and a CLI consuming the library. The purpose of this is to enable 3rd parties to make use of this infrastructure.
  • integration of statistics display into lbadm.
  • addition of failover functionality using VRRP.
  • add configuration persistence and integrate with SMF.
  • integration with ipnat configuration1.
  • implement some form of check for the "health" of backend servers
  • enable management of several hosts as single entities (host pools)
  • connection "stickiness"


1) so far, I've not explained one major aspect: load balancing methods and topology. Topologies known in the industry are DSR (direct server return - the load balancer never sees return traffic, or just forwards it back without any modification) and NAT (half vs. full); known methods are round-robin or various forms of connection weighing. ipfilter, which has been in Solaris for quite some time and has been available as an opensource project for much longer, has some NAT functionality. For the prototype, we implemented DSR functionality seperate from ipfilter's nat functionality, and in no way integrated the administration of ipnat with lbadm



Thursday Jun 12, 2008

One time sometime in 1999 or 2000 (I think ...), when I was driving through Munich (or, more precisely, trying to do so) , I had to stop next to a construction site and a brick dropped down from several stories right onto the bonnet of my car and made a significant dent (and gave me a tremendous shock). I guess I got lucky that I'd had to stop where I was, otherwise it may have come in through the windshield and damaged things that were (and are) important to me. Again, luckily I had no passenger with me whom I would have had to console ...

I was lucky enough to find someone from the building company who actually took responsibility, so to get the insurance etc. sorted out was no problem, though it took it's time. Once I had the OK to have it fixed, I went about it, and here's a short piece I wrote about an experience I had while my car was being fixed ...

You can drive my baby car

I'd been planning to take my car to be repaired for a few weeks now - the dent in its bonnet was on its way to be something like Marilyn Monroe's beauty spot, something I just cannot have.

Naturally, we'd arranged for me to have a replacement during the time I had to leave my car at the garage, so when I'd taken care and leave of my meanwhile trusted friend, I was taken into custody by a representative of the rental car company, who with his three-piece suite was definitely much better dressed for the snow and the cold than I was with my coat, hat, gloves and once-fur-lined boots. Perhaps the amount of gel in his hair helped keep the cold out ... We took to the road (Frankfurter ring, full of cars but clear of snow) in a microscopic car and made our way to the rental company, where we'd do the paper work. Getting out of the car, I asked whether there were winter tires on the car, which produced a dubious look and a shake of the head. For safety, I repeated the request.
VW Lupo
After we'd filled in the papers, my representative consulted a colleague about the tire-issue, and got a key from the locker. We then went out to the car park, where he unlocked a car the size of my hat that was painted a ghastly, poisonous lime green, about the colour of very unripe lemons. I almost threw up on the spot, which would have improved the colour no end. It did not get better after he'd cleaned the snow off. Nor where the winter tires much good.

(photo credit: Wikipedia.org - incidentally, this looks very much like the car I had)

Then I set off. As long as I was moving with the crowd, it wasn't so bad, apart from the moment when a fat BMW almost pushed me out of the way - the driver probably didn't even notice me, despite the colour. Then I reached the motorway. Stepping on the accelerator was an experience comparable to almost nothing, well, to nothing, actually, because that's exactly what happened for the first few seconds. Slowly the car would pick up speed, until at about 100 km/h, it would scream its little heart out trying to move faster.

I also noticed that almost the only thing that seems to be standardised apart from the steering wheel is the indicator switch (although this one needed extra strength - they probably think of big blond beefy guys with heavy-weight experience when designing these things); the wind-screen wiper switch works the wrong way (this one goes up for on, mine goes down), and they'd placed the rear-window heater switch next to the emergency signalling light switch, which is a very convenient thing for telling everybody that your rear window is misted up. The heating controls were at a level where I normally don't bend down to during driving, because I had to put my head on the passenger seat to reach it. Perhaps they intend you to get to know your female co-pilot better this way.

The switch to turn off the fresh air from outside thank-you-very-much only works when you're not defrosting the windscreen, which leaves you the choice, when following a big lorry, to either die of suffocation or crash into its rear because you can't see anything, but then, in this car, you'd probably go straight under the lorry and overtake it from below, so to speak.

The cute thing about this car is its fuel consumption. I got it with 7/8ths full, used another 8th, put in 8.85 litres and it was full. Marvellous. You could probably drive it on thimble-fulls.

I guess it's lucky this all happened in Winter. Imagine taking such a car to your favourite golf club, some ignorant dolt might put it down the next hole without a second thought.

----

(apologies to the Beatles for borrowing and mangling the famous song title ...) 


Tuesday Apr 22, 2008

Soon after my family and I had moved to the USA, we noticed something which had registered with me only subconsciously during some of my former visits to the country: 

Americans seem to be much more tolerant of noise than Europeans (those that we know, anyway).

This can be seen in the following examples:

  • Highways cut right through residential areas.
  • The windows in the places we looked at before we found our current home are all quite shabby, you can hear the traffic from miles away (esp. from aforementioned highways)
  • The forced-air heating many places use makes a terrible racket - we had to turn it down in our bedroom because we couldn't sleep.
  • Very many cars emit a loud beeping noise when backing up; I can understand the desire to have this in a trailer or truck that's half a mile long, but in a pickup?

Needless to say, this is not the case in the places I lived at before we moved, so some adapting has been going on. Luckily, our kids seem to be very good at this; I don't get the feeling that the noise is causing them to lose any sleep.

I would like to know though if anybody can come up with a good explanation for the above observation. I don't think that American ears are inherently less sensitive than ours, so there must be a more complex answer ...

Tuesday Apr 08, 2008

I'm happy to report that Harley Hahn's recent book, "Harley Hahn's Guide to Unix and Linux", appeared on my doorstep a few days ago.  Front cover for Harley's new book

Why is this relevant to me and to Sun?

It's relevant to me because I reviewed the whole book, chapter by chapter, as Harley wrote it. I was by no means the only person to do so, as you will see when you read the acknowledgements, but I'm proud to have been part of the effort. This is a new book following Harley's earlier "Student's Guide to Unix" and "Harley Hahn's Student Guide to Unix"; I've now done this (ie, review a book on Unix for Harley) for the third time, but still there's a lot to be learned from helping (if only a little) on such a project.

I also believe it's relevant to Sun, as I helped convince Harley to use Solaris as a test platform for his examples, besides Linux and FreeBSD (Harley was kind enough to forward me some snippets of my own emails proving this). I need to thank my colleague and former team mate Helmut vom Sondern, as he was kind enough to let Harley access a zone (and was very helpful keeping it accessible from across the Atlantic) on one of his systems to do all those tests (Antoon Huiskens also helped initially).

Initially, I got involved in this effort in 1992, when Harley sent out a message on comp.unix.questions, titled "Request for opinions about a new Unix book" (here's the complete message), which I saw and responded to. Apparently, Harley saw some merit in what I had to say, and I ended up reviewing the whole book (which then became "A Student's Guide to Unix") for him. Since then, I also reviewed the 2nd edition and various other books about the Internet for Harley, though none in as much detail as the Unix books.

I've stopped reading usenet since (mainly for the sheer volume), but it shows what can happen if you're not careful ;-)

You might like to know that the artwork on the book's cover, as shown in the picture, is from one of Harley's own paintings - he has quite a few of them at home! There's still a debate going on whether he's a better author or a better painter.

 

Friday Feb 15, 2008

Research indicates that one's first public entry in a blog say something about oneself. I'm too conventional (this is the first statement about myself) to break with that, so here's an introduction of sorts ...

 

Why the name?

I've been using the line "Recursion, n.: see 'Recursion" in my email signature for years. The idea is definitely not mine, but I liked it the moment I saw it (in a book, the title of which I forget, that's a kind of "computer terms for dummies", which I was given for a birthday too long ago to remember). One day, when the occasion for a .sig change came up, it popped into my head, and there it was.

While contemplating a title for this blog, it occurred to me that many things, foremost the Internet (or "the web") and software tend to refer back to themselves - the Internet through its essential mechanism, the link (aka URL), and software in the form of functions, procedures or subroutines (essentially, pieces of code organised in units) referring to themselves. This can be quite tricky, especially if you need to debug it, as it comes in two slightly different shades: direct recursion is the more obvious, as a function calls itself directly (hopefully with modified arguments), whereas indirect recursion refers to the case where, between two invocations of a function, there are one or more other functions in the call stack (a note to the language-conscious reader: English is not my first language. Therefore you may spot something now and then that looks like German written with English words ... consider yourself warned).

 

Who am I? (who cares? I do!)

I've been with Sun for quite some time (too long, some would argue) - since November 1998, to be precise. I spent eight years in the support organisation. During that time, it was my job to handle incidents with and on Sun equipment, and more importantly, Solaris, that customers with service maintainance contracts would report. I specialised in crash dump analysis and performance stuff , but of course a lot of other stuff crept in (if any of my former customers thinks the name sounds familiar: you're right ;-).
I managed to transition to the Engineering organisation at the end of 2006, and moved from Bavaria to the Bay Area (hmm ... that's almost the same sound ...) with my family in March 2007. I now work in the Solaris Networking group as a software engineer.

Before joining Sun, I held a few software engineering jobs in Austria, my home country.

 

What do I do (when I'm not working)?

As I indicated above, I have a family (no pictures here, sorry), and I spend most of my free time with them. Luckily, my wife shares my passion for photography, so now and again we manage to get some nice pictures taken (in our opinion, anyway). I'm still thinking about how to publish some of that (or links) here ... stay tuned, but don't hold your breath.
We both have a strong background in traditional Austrian/Bavarian Volksmusik ("folk music" is a translation that IMO does not perfectly fit the bill - if you know German, take the time to read up on it). We've both played in various groups over time, and sorely miss the opportunity to do so in California (if you know someone who's proficient in that area looking for people to join, or if you are such a person, drop me a line!) - two isn't a group (not with our instruments, anyway) and our kids are too young to be any help there.
I also like to read, although I won't bore you here with a list of titles or authors. That does not mean I won't in future entries!


Hmm ... I seem to have inadvertantly parked this entry in the Drafts section for over a week. Sorry.

Thursday Feb 07, 2008

The last year, since joining the Engineering organisation at Sun, I've been working on what started as the xen project (a port of the opensource xen project to Solaris) but, for reasons I won't go into, had to be renamed to xVM, a name which has since taken on almost a life of its own and is now an umbrella for a lot of the virtualisation technology at Sun.

A few words on what xVM is and how it works: xVM is a technology for running more than one instance of an operating system (such as (Open)Solaris or Linux) on the same physical machine at the same time. To provide isolation for the running OS instances, also called (guest) domains, or guests, from one another, xVM provides a thin layer called the hypervisor which acts like a virtual machine towards the running domains. The hypervisor exposes a set of interfaces to these domains by means of which the domains can access HW and communicate with one another and the outside world. This implies that the OS in question be made aware of the fact that it's running on top of the hypervisor, this is called paravirtualisation.
There are also provisions to run an OS completely agnostic of the underlying hypervisor; in xen parlance, this an HVM or fully virtualised guest. These require an additional emulation layer for I/O, which costs a performance penalty.
Finally, there are paravirtualised drivers for otherwise unmodified guests to work around the cost of the aforementioned emulation layer.

The hypervisor itself is minimal, for administrative stuff (like starting guest domains) and for things like access to network cards etc, it requires a privileged guest, dom0. All other guest domains are called domUs. (There is work underway to let domUs access eg. I/O HW directly, so the above description is a bit out of date, but it's sufficient for the following discussion.)

Opensolaris.org hosts a xen community page, where you can read a lot of good stuff about this effort.

 

Coming, as I was, from the Networking group, I worked on - surprise! - networking stuff, specifically, on network communication between domains.
Currently, domUs cannot talk to one another directly (something our friends from the LDoms league have already achieved), but have to go via dom0. You can see an illustration of the data flow here (and thx to my colleague David Edmondson for letting me share this). If you're familiar with the way this looks in Linux, you'll notice some difference. This is due to vnics, technology we borrowed from the crossbow project at an early stage, another part of our overall virtualisation strategy that's coming along nicely. More below. A word on names: in Xen parlance, a communication channel consists of a frontend and a backend. The frontend lives in domU, the backend lives in dom0 (therefore xnf and xnb)

There's two distinct mechanisms that can be used for transporting data between domains: page flipping and hypervisor copying. For page flipping, the hypervisor sets aside some of its own (fairly limited) memory, which can then be used to transfer data between domains, involving repeated mappings into and out of the domains' address space. For hypervisor copy (HVcopy), OTOH, guest domains set aside their memory and indicate this memory to the hypervisor, which copies data into this memory at another domain's request (in our case, only dom0 - this means that for now HVcopy is only used for moving data into a domU, not out)

When I joined, only page flipping had been implemented for xVM, but it was found desirable to also have hypervisor copy. The reasons I can dredge up from memory are:

  1. in high-load situations, the hypervisor would run out of pages available for page flipping, which would cause tons of error messages on the console and a significant drop in NW throughput.
  2. some Windows (drivers, AFAIK Windows itself has not been paravirtualised) only "speak" HVcopy, and we intend (Open)Solaris to be able to host Windows domUs as well.
  3. expected performance increase.

I implemented HVcopy for xnb and xnf, and am happy to report that we successfully addressed issues 1 and 3. We could not find Windows drivers that would interoperate with Solaris as domU (corrected:) dom0, so that is as yet unresolved.

 

As I mentioned above, the vnic code we got from the crossbow project which we incorporated into the xVM code, and which was putback to Nevada in build 75, was a very early version, and this code has seen some significant change in the crossbow project itself since. Recently, I updated the relevant parts of the xVM code (xnbo, xnbu) to the new "look and feel" of the crossbow API - this means that as of about mid-January, it is possible to for the crossbow "bits" to be booted as dom0, to start domUs, and - this is what it's all about, obviously, to successfully move network traffic in and out of the guest domains. As crossbow itself is not a part of Nevada yet, this change has only taken place in the crossbow "gate" (the copy of the Nevada gate where the crossbow development happens). Please refer to the crossbow community pages for when you can download what.


Update: Nicolas Droux pointed me to a more detailed document about network virtualisation in Solaris. Thx.