Saturday Oct 03, 2009

Rationale

 Doing so many benchmarks, profiling and other various performance related activities, I had to find a way to "keep an eye" on things while fetching emails, chatting on IM and the like. Having some experience in past projects with microcontrollers, although on Windows, I figured I could put together a little gizmo to help me keep tabs on my Directory Server.

Bird's Eye View

This is basically a simple setup with a USB Bit Whacker controlled by a Python script, feeding it data crunched from various sources, mainly the Directory Server access log, the garbage collection log and kstats... the result is a useful dashboard where I can see things happen at a glance.

The Meat

Everything starts with the USB Bit Whacker. It's a long story, but to cut short, a couple a years ago, Kohsuke Kawaguchi put together an orb that could be used to monitor the status of a build / unit tests in Hudson. Such devices are also know as eXtreme Feedback Devices or XFDs. Kohsuke chose to go with the USB Bit Whacker (UBW) for it is a USB 'aware' microcontroller that also draws power from the bus, and is therefore very versatile while remaining affordable ($25 soldered and tested from sparkfun but you can easily assemble your own). A quick search will tell you that this is a widely popular platform for hobbyists.

 On the software side, going all java would have been quite easy except for the part where you need platform specific libraries from the serial communication. Sun's javacomm library or rxtx have pros and cons but in my case, the cons were just too much of a hindrance. What's more, I am not one to inflict myself pain unless it is absolutely necessary. For that reason, I chose to go with Python. While apparently not as good on cross-platformedness compared to Java, installing the Python libraries for serial communication with the UBW is trivial and has worked for me right off the bat on every platform I have tried, namely: Mac OS, Linux and Solaris. For example, on OpenSolaris all there is to it is:

 $ pfexec easy_install-2.4 pySerial
Searching for pySerial
Reading http://pypi.python.org/simple/pySerial/
Reading http://pyserial.sourceforge.net/
Best match: pyserial 2.4
Downloading http://pypi.python.org/packages/source/p/pyserial/pyserial-2.4.tar.gz#md5=eec19df59fd75ba5a136992897f8e468
Processing pyserial-2.4.tar.gz
Running pyserial-2.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Y8iJv9/pyserial-2.4/egg-dist-tmp-WYKpjg
setuptools
zip_safe flag not set; analyzing archive contents...
Adding pyserial 2.4 to easy-install.pth file

Installed /usr/lib/python2.4/site-packages/pyserial-2.4-py2.4.egg
Processing dependencies for pySerial
Finished processing dependencies for pySerial

 that's it! Of course, having easy_install is a prerequisite. If you don't, simply install setuptools for your python distro, which is a 400kB thing to install. You'll be glad you have it anyway.

Then, communicating with the UBW is mind boggingly easy. But let's not get ahead of ourselves, first things first:

Pluging The USB Bit Whacker On OpenSolaris For The First Tim

The controller will appear as a modem of the old days and communicating with equates to sending AT commands. For those of you who are used to accessing Load Balancers or other network equipment through the serial port, this is no big deal.

In the screenshot below, the first ls command output shows that nothing in /dev/term is an actual link, however, the second -which I issued after plugging the UBW on the usb port- shows a new '0' link has been created by the operating system.


Remember which link your ubw appeared as for our next step: talking to the board.

Your First Python Script To Talk To The UBW

I will show below how to send the UBW the 'V' command which instructs it to return the firmware version, and we'll see how to grab the return value and display it. Once you have that down, the sky is the limit. Here is how:

from serial import *
ubw = Serial("/dev/term/0")
ubw.open()
print "Requesting UBW Firmware Version"
ubw.write("V\n")
print "Result=["+ubw.readline().strip() + "]\n"
ubw.close()

Below is the output for my board:

Voila!

That really is all there is to it, you are now one step away from your dream device. And it really is only a matter of imagination. Check out the documentation of current firmware to see what commands the board supports and you will realize all the neat things you can use it for: driving LEDs, Servos, LCD displays, acquiring data, ...

Concrete Example: The OpenDS Weather Station

As I said at the beginning of this post, my initial goal was to craft a monitoring device for OpenDS. Now you have a good idea of how I dealt with the hardware part, but an image is worth a thousand words so here is a snap...

On the software front, well, being a software engineer by trade, that was the easy part so that's almost not fun and I won't go inot as much detail but here is a 10,000ft view:

  • data is collected in a matrix of hash tables.
  • each hash table represent a population of data points for a sampling period
  • an individual time thread pushes a fresh list of hash tables in the matrix so as to reset the counters for a new sampling period

So for example, if we want to track CPU utilization, we only need to keep one metric. The hash table will only have one key pair. Easy. Slightly overkill but easy. Now if you want to keep track of transactions response times, the hash table will keep the response time (in ms) as a key and the number of transactions that were processed in that particular response time as the associated value. Therefore, if you have within one sampling period, 10,000 operations processed with 6,000 in 0 ms, 3,999 in 1ms and 1 in 15 ms, your hashtable will only have 3 entries as follows: [ 0 => 6000; 1=>3999; 15=>1 ]

This allows for a dramatic compression of the data compared to having a single line with etime for each operation, which would result in 10,000 lines of about 100 bytes.

What's more is that this representation of the same information allows to easily compute the average, extract the maximum value and calculate the standard deviation.

All that said, the weather station is only sent the last of the samples, so it always shows the current state of the server. And as it turns out, it is very useful, I like it very much just the way it worked out.

 Well, I'm glad to close down the shop, it's 7:30pm .... another busy Saturday

Friday Oct 02, 2009

I just thought I'd make a note of the common things I do and funny enough, I think this blog might be the closest thing I have from a sticky note / persistent backup ... so here goes:

 PATH=/usr/bin:/usr/sbin:/usr/gnu/bin:$PATH
# enable power management
pfexec echo "S3-support    enable" >> /etc/power.conf
pfexec pmconfig
pfexec svcadm restart hal

# disable access time update on rpool to minimize disk writes
pfexec zfs set atime=off rpool

# get pkgutil to install community software
pfexec pkgadd -d http://blastwave.network.com/csw/pkgutil_`uname  -p`.pkg

# download and install the flash plug-in for firefox
wget http://fpdownload.macromedia.com/get/flashplayer/current/flash_player_10_solaris_x86.tar.bz2 -O libfp.tar.bz2 --no-check-certificate
bunzip2 libfp.tar.bz2
tar xf libfp.tar
pfexec mv flash_player*/libflashplayer.so /usr/lib/firefox/plugins
rm libfp.tar
rmdir flash_player*

# get perfbar
wget http://blogs.sun.com/partnertech/resource/tools/perfbar.i386 -O perfbar --no-check-certificate
chmod 755 perfbar
nohup ./perfbar &

# configure coreadm
coreadm -g /var/cores/%t-%f -e global

Talking with a friend recently, he told me about his miserable experience trying to get his workstation to work with four monitors.

Now, I was surprised at first because there are lots (ok, maybe not lots, but a sizeable number) of people with quad-head workstations out there, so obviously that seems rather doable. The trick in his case seemed to be heterogeneity: 2 different dual-head cards, and 4 different monitors of different brands and sizes. Additionally, he wanted one of his widescreens tilted in portrait mode for his coding. Nice for browsing as well, but he wanted to be able to have a tall IDE to see more code at once without the need to scroll.

It took me a while just to get the equipment but to find some spare time to this as well. I ended up with the following:

  1. a desktop that would lend itself to the experiment
  2. 4 dual head videos cards to test combinations
    1. nVidia GTX 280
    2. nVidia Quadro FX 380
    3. nVidia GeForce 9600 GT
    4. nVidia GTS 250
  3. 4 monitors
    1. Sun 24.1"
    2. Dell 22"
    3. Acer 24.3"
    4. Dell 20"
  4. a free Saturday (that was actually the most difficult component to find)

To cut short, the result is ... rolling drum ... it _can_ work once you know what to do and what not to. Here is the final result:

So how do we make that work? Well, first thing is NOT to desperately cling to TwinView. You have to let go of that, fall back on good ol' XineRama which does a fine job anyway.

As I said in my previous post, rotating the monitor is only a matter of adding Option "Rotate" "left" in the relevant screen section.

For all the X options explained, I found this quite useful. Dig in there.

What you want to be careful about:

  • if at first both cards are not recognized, worry not. Go to a terminal and issue the following command:

pfexec nvidia-xconfig -a

This will force the nvidia config utility to look across all cards.

Note that if this still doesn't work, issue:

pfexec scanpci

and write down the PCI id for each card. It is the first number right after the pci bus 0x002. In this example, this would translate into

BusID "PCI:2:0:0"

in the device section in xorg.conf

  • look at your /var/log/Xorg.0.log for errors
    • you will see something like

(II) LoadModule: "xtsol"
(WW) Warning, couldn't open module xtsol
(II) UnloadModule: "xtsol"
(II) Failed to load module "xtsol" (module does not exist, 0)

 Don't worry, that's a trusted solaris extension that is hardcoded to be loaded by X even when it's not a trusted solaris OS running, this has yet to be fixed. 

  •  make sure to enable Composite
  • make sure to enable GLX with composite
  • make sure to enable RandRRotation
  • Check /var/adm/messages for IRQ collisions which could result in some funky discrepancies. If you find any, tweak your BIOS to force each PCI slot to a distinct IRQ. The message would look similar to:

unix: [ID 954099 kern.info] NOTICE: IRQ16 is being shared by drivers with different interrupt levels

All that said, here is an example of xorg.conf with a single monitor tilted, and everything working pretty well considering that nothing is matched. It does work but doesn't come for free as you can see. There is one drawback however, I have not been able to make Compiz work because apparently the cards would have to have an SLI link between them, but I haven't confirmed that for sure. That's it for today folks!

As usual, I try to give as much away in my titles as I can. This one is no different: it just works....


With 2009.06, you needed to build your own drivers for ethernet and wifi. Pretty much a non starter for 99% of users, understandably so: when it just works for Linux and Windows, why sweat it on OpenSolaris ?

Now that dilemma is behind us: I installed an early access of 2010.02 (OpenSolaris b124) and when the installation was done, everything worked: a whole new  experience for me on OpenSolaris. I almost EXPECT to have to fiddle with a driver, a config file, an SMF service that doesn't start, ..., something.

In this case: nothing! Simultaneously gratifying and almost disappointing. I mean, even on my desktop OpenSolaris required some elbow grease to work  the way I wanted, but in this case, the coveted prize of a functional system would be handed to me without even the hint of a fight ? ... unusual, to say the least.

And that's good. I used to say that Solaris is the certainly best server OS and just as certainly the worst desktop OS, but this one shot has me wondering... maybe the Sun engineers have covered some of the ground that separates OpenSolaris from Linux. Granted, there's still ways to go! Yes the embedded 1.3 Mega Pixels webcam works but the quality of the picture is perfectible and I don't think it is the hardware... to be fair, Sun has to write their own drivers for everything so I'm even surprised it worked at all, so that pretty good!

Now there is on rather big bummer though...it does suspend but doesn't resume. Pretty big issue for a laptop which is -because of its form factor- bound to be used on the go. If I can make it work, I will post here. If you have had success make resume work, drop me a line!



OpenSolaris 2010.02 early access build 124 is really faring pretty well so far. It isn't free of issues, granted, but at the same time, it has improved leaps and bounds on laptop support, especially for netbooks, thanks to a passionate and dedicated team writing up a bunch of device drivers for wifi and network cards found in these little laptops.

Today, I installed build 124 on a Lenovo W700ds.

You probably have never heard of that beast because they probably only sold half a dozen of them, one of which landed on my desk yesterday. The main reason for this success is probably that it weighs a ton (11 lbs or 5 Kgs!!!) due in part to its main 17" monitor, doubled by a netbook-like 10" monitor that slides out from behind the main one.... here are the specs. Notice they call it "portable power". Trasnportable would be more accurate. After using this laptop for about an hour now (I'm writing this post on it), I do have to say that it is quite fantastically comfortable, just about as much a desktop would be...not really surprising if you consider it has a full size keyboard + numeric keypad.

 So, OpenSolaris installs without a glitch, once again the installer just does its job without whining. If you run the device driver utility it will notify you that two devices do not have a driver for solaris, one being the integrated bluetooth card and the other being the fingerprint reader. Not a big deal. Once OpenSolaris is installed, it will boot in Gnome just as on any other machine, but what you really want is the second monitor to work... and there's a trick to that.

 First, the second monitor won't be recognized if you don't pull it all the way out at boot time. Took me a while to figure this one out. To save some mW, the Lenovo folks don't power it unless it's out and that makes it undetectable at first.

Second, once recognized by X, it will actually display sideways. This "companion" display is actually is 16:9 10" netbook display tilted right so that it's width resolution (1280x768) almost matches the height resolution of the main display (1920x1200). So all we have to do is to tilt it "left" to compensate for the hardware arrangement. To do so, simply enable the Rotate and Resize option on the graphics card and then tell X to rotate the appropriate screen left. Here's how:

Section "Monitor"

    # HorizSync source: edid, VertRefresh source: edid
    Identifier     "SlideOut"
    VendorName     "Lenovo"
    ModelName      "LEN 2nd Display"
    HorizSync       30.0 - 75.0
    VertRefresh     60.0
    Option         "DPMS"
    Option         "Rotate" "left"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "Quadro FX 2700M"
    BusID          "PCI:1:0:0"
    Screen          1
    Option       "RandRRotation"    "on"
EndSection

 Note that TwinView must be disabled because twinview aggregates both display into a single block. Rotation with twinview on will result in rotate both displays. So you need to make them two X displays and enable xinerama.

here is the final xorg.conf in case you're interested...

Additional notes:

Suspend/Resume works great with this laptop -most of the time- however, it seems that sometimes, you will lose the second display upon resume, I'm not sure why.

Wednesday Sep 30, 2009

Rationale

Why not run your Authentication service in the cloud? This is the first step to having a proper cloud IT. There are numerous efforts going to ease deploying your infrastructure in the cloud, from Sun and others, from OpenSSO to glassfish, from SugarCRM to Domino, and on goes the list. Here is my humble contribution for OpenDS.

Bird's Eye View

 Tonight I created my EC2 account and got OpenDS going on the Amazon infrastructure in about half an hour, I will retrace my steps here and point out some of the gotchas.

The Meat

Obviously, some steps must be taken prior to installing software.

First, you need an AWS (Amazon Web Services) account with access to EC2 (Elastic Compute Cloud) and S3 (Simple Storage Service). I will say this about EC2, it is so jovially intoxicating that I would not be surprised to be surprised by my first bill when it comes... but that's good, right? At least for amazon it is, yes.

Then you need to create a key pair, trivial as well. Everything is explained in the email you receive upon subscription.

Once that's done, you can cut to the chase and log on to the AWS management console right away to get used to the concepts and terms used in Amazon's infrastructure. The two main things are an instance and a volume. The names are rather self explanatory, the instance is a running image of an operating system of your choice. The caveat is that if shut it down, the next time you start this image, you will be back to the vanilla image. Think of it as a LiveCD. Can't write persistent data to it, if you do, it won't survive a power cycle.

To persist data between cycles, we'll have to rely on volumes for now. Volumes are just what they seem to be, only virtual. You can create and delete volumes at will, of whatever size you wish. Once a volume is created and becomes available, you need to attach it to your running instance in order to be able to mount it in the host operating system. CAUTION: look carefully at the "availability zone" where your instance is running, the volume must be created in the same zone or you won't be able to attach it.

 Here's a quick overview of the AWS management console with two instances of OpenSolaris 2009.06 running. The reason I have two instances here is that one runs OpenDS 2.0.0 and the other runs DSEE 6.3 :) -the fun never ends-. I'll use it later on to load OpenDS.

My main point of interest was to see OpenDS perform under this wildly virtualized environment. As I described in my previous article on OpenDS on Acer Aspire One, virtualization brings an interesting trend in the market that is rather orthogonal to the traditional perception of the evolution of performance through mere hardware improvements...

In one corner, the heavy weight telco/financial/pharmaceutical company weighing in at many millions of dollars for a large server farm dedicated to high performance authentication/authorization services. Opposite these folks, the ultra small company curled in the other corner, looking at every way to minimize cost in order to simply run the house while allowing to grow the supporting infrastructure as business ramps up.

Used to be quite the headache, that. I mean it's pretty easy to throw indecent amounts of hardware at meeting crazy SLAs. Architecting a small, nimble deployment yet able to grow later? Not so much. If you've been in this business for some time, you know that every iteration of sizing requires to go back to capacity planning and benchmarking which is too long and too costly most of the time. That's where the elastic approaches can help. The "cloud" (basically, hyped up managed hosting) is one of them.

Our team also has its own, LDAP-specific, approach to elasticity, I will talk about that in another article, let's focus on our "cloud" for now. 

 Once your instance is running, follow these simple steps to mount your volume and we can start talking about why EC2 is a great idea that needs to be developed further for our performance savvy crowd.

In this first snapshot, I am running a stock OpenDS 2.0.0 server with 5,000 standard MakeLDIF entries. This is to keep it comparable to the database I used on the netbook. Same searchrate, sub scope, return the whole entry, across all 5,000.

If this doesn't ring a bell? Check out the Acer article. Your basic EC2 instance has about as much juice as a netbook. Now the beauty of it all is that all it takes on my part to improve the performance of that same OpenDS server is to stop my "small" EC2 instance and start a medium one.

Voila!

  I've got 2.5 times the initial performance. I did not change ONE thing on OpenDS, this took 3 minutes to do, I simply restarted the instance with more CPU. I already hear you cry out that it's a shame we can't do this live -it is virtualization after all- but I'm sure it'll come in due course. It is worth noting that even though I could use 80+% of CPU on the small instance of OpenDS, in this case I was only using about 60% so the benefit would likely be greater but I would need more client instances. This imperfect example still proves the point on the ease of use and the elasticity aspect.

The other thing that you can see coming is an image of OpenDS for EC2. I'm thinking it should be rather easy to script 2 things:

1) self-discovery of an OpenDS topology and automatic hook up in the multi master mesh and

2) snapshot -> copy -> restore the db, almost no catch up to do data wise. If you need more power, just spawn a number of new instances: no setup, no config, no tuning. How about that ?

Although we could do more with additional features from the virtualization infrastructure, there is already a number of unexplored options with what is already there. So let's roll up our sleeves and have a serious look. Below is a snapshot of OpenDS modrate on the same medium instance as before with about 25% CPU utilization. As I said before, this thing has had NO fine tuning whatsoever so these figures are with the default, out-of-the-box settings.

  I would like to warmly thank Sam Falkner for his help and advice and most importantly for teasing me into trying EC2 with his FROSUG lightning talk! That stuff is awesome! Try it yourself.

Tuesday Sep 29, 2009

Rationale

 I was recently faced with the challenge to track down and eliminate outliers from a customer's traffic and I had to come up with some some of tool to help in diagnosing where these long response time transactions originated from. Not really rocket science -hardly anything IS rocket science, even rocket science isn't all that complicated, but I digress- yet nothing that I had in the tool box would quite serve the purpose. So I sat down and wrote a tool that would allow me to visually correlate events in real time. At least that was the idea.

Bird's Eye View

This little tool is only meant for investigations and we are working on delivering something better and more polished (code name Gualicho, shhhhhhh) for production monitoring. The tool I am describing in this article simply correlates the server throughput, peak etime, I/O, CPU, Network and Garbage Collection activity (for OpenDS). It is all presented in a sliding line metric, stacked on top of each other, making visual identification and correlation easy. Later on I will adapt the tool to work on DPS, since it is the other product I like to fine tune for my customers.

The Meat

When pointed to the access log and the GC log, here is the text output you get. There is one line per second that is displayed with the aggregated information collected from the access log and garbage collection as well as kstats for network, I/O, CPU.


If you looked at it closely, I represented the garbage collection in % which is somewhat unsual but after debating on how to make this metric available, I decided that all I was interested was a relative measure of the time spent in stop-the-world GC operations over the time the application itself is running. As I will show in the snapshot below, this is quite effective to spot correlations with high etimes in most cases. To generate this output in the GC log, all you have to do is add the following to your set of JAVA_ARGS for start-ds.java-args in /path/to/OpenDS/config/java.properties:

 -Xloggc:/data/OpenDS/logs/gc.log -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime

And then my GUI will show something like:


Don't hesitate to zoom in on this snapshot. The image is barely legible due to blog formatting constraints.

Excuse me if I have not waited 7+ days to take the snapshot for this article but I think this simple snap serves the purpose. You can see that most of the time we spend 2% of the time blocked in GC but sometimes we have spikes up to 8% and when this happens, even though it has little impact on the overall throughput over one second, the peak etime suddenly jumps to 50ms. I will describe in another article what we can do to mitigate this issue, I simply wanted to share this simple tool here since I think it can serve some of our expert community.

Rationale

As far fetched as it may seem,  with the growing use of virtualization and cloud computing, the average image instance that LDAP authentication systems are having to run on look more like your average netbook than a supercomputer. With that in mind, I set out to find a reasonable netbook to test OpenDS on. I ended up with an Acer Aspire ONE with 1GB of RAM. Pretty slim on memory. Let's see what we can get out of that thing!

Bird's Eye View

In this rapid test I have done, I loaded OpenDS (2.1b1) with 5,000 entries (stock MakeLdif template delivered with it), hooked up the netbook to a closed GigE network and loaded it from a corei7 machine with searchrate. Result: 1,300+ searches per second. Not bad for a machine that only draws around 15 Watts!

The Meat 

As usual, some more details about the test but first a quick disclaimer: this is not a proper test or benchmark of the Atom as a platform, it is merely a kick in the tires. I have not measured other metrics than the throughput and only for a search workload at that. It is only to get a "feel" of it on such a lightweight sub-notebook.

In short:

  • Netbook: Acer Aspire One ZG5 - Atom N270 @1.6GHz, 1GB RAM, 100GB HDD
  • OS: OpenSolaris 2009.05
  • FS: ZFS
  • OpenDS: all stock, I did not even touch the JAVA options which I usually do
  • JAVA: 1.6 Update 13

The little guy in action, perfbar shows the CPU is all the way up there with little headroom...


Monday Aug 24, 2009

Rationale

A number of customers I talk have a hugely diverse ecosystem of application relying on the LDAP infrastructure for authentication, single sign-on and also user-specific configuration storage. Very few have a strictly controlled environment with a reduced set of well-known clients.

One cause of trouble I have seen many times over sparks from client applications not being robust and poorly handle the protocol. There is an easy way to grow confidence in your infrastructure and ecosystem at the same time: after setting up the prototype and before you go in production, during the QA stage, try to spend some time intentionally injecting errors in your traffic. You'll immediately see if  clients start blowing up left and right!

Bird's Eye View

To cut to the chase, this plug-in sits on DS as a pre operation search. You can "create" any entry simply by adding a configuration parameter to the plug-in. For example, if you want to have DS return "no such entry" (Error 32) for cn=nosuch,dc=example,dc=com, like shown below :

all you would have to do (once the plug-in is properly set up) is:

dsconf set-plugin-prop arbitrary-response argument+:cn=nosuch,dc=example,dc=com#32#0

The Meat

I honestly have no idea why I have not shared this small tool earlier. I wrote this plug-in years ago for Directory Server 5.2 and later on recompiled it against DS 6.x on OpenSolaris. Currently it is built for Solaris 9/10/OpenSolaris x86/x64. If you want it on another platform, let me know and I'll spin it for you.

To install this plugin, simply unzip the file and then follow the instructions in the bundled README file. The sequence of commands will work for DS 6.x.

In its current version (1.1b) the plug-in can inject errors as well as delays into an arbitrary response. This means that you can easily test how connection idle timeouts are managed by your client applications connection pooling mechanism, if any.

Injecting delay is done through the third parameter of the plug-in. For example, to return a valid response with error code 0 after 15 seconds, you would have to add the following argument to the plug-in:

dsconf set-plugin-prop arbitrary-response argument+:cn=ok,dc=example,dc=com#0#15

Useful things not in this version

  1. I will probably add a 4th parameter which will represent the probability that the error is returned, otherwise, just pass on the request to DS core

  2. Ability to interpret regular expressions in the base DN part of the plug-in argument

That's it for today!

Tuesday Aug 18, 2009

Rationale

Recently engaged on a project where a Directory Server farm capable of  delivering 100,000 searches per second , I was faced with skepticism about the ability for DPS to keep up in such a demanding environment. I had to prove ten times over that our proposed architecture would fit the bill and quantify how much headroom it would provide. Here is some of the things we observed along the way.

Bird's Eye View

 In this article I will quickly talk about a theoretical performance of DPS under ideal circumstances, you'll see that the setup is quite far from a real deployment from the hardware to the way the stack is set up.

The Meat

The results

I won't make you read through the whole thing if you do not want to so here goes: 

Throughput

DPS Throughput

Response Time

DPS Response Time

Setup

The box  I'm developing on and running tests these days is somewhat unusual and the proper disclaimer is necessary: it is a corei7 (975 EE) very slightly overclocked to 3.9GHz fitted with 12GB of 1600MHz DDR3 RAM. It runs OpenSolaris 2009.06 with ZFS. There is a 10,000 rpm 320GB drive and an Intel Extreme 32GB SSD. All in all a nice rig but not a server either. The closest server if I had to give a comparable would be a beefed up Sun Fire X2270. Keep in mind that my desktop is single socket but about twice faster on the CPU clock rate.

To load this, I use SLAMD. One server, 7 clients (2 Sun Fire T5120, 4 Sun Fire X2100, 1 Sun Fire X4150). For the record, this client set has generated north of 300,000 operations per second in other tests I ran in the past. Clients in this test campaign were always loaded at less than 15% CPU, so results are not skewed by CPU pressure on the client side.

On the software side (middleware I should say) we have an Sun Ultra 27 running OpenDS with a null back-end  and my rig running running my workspace of DPS 7 plus my little sauce...

Why a null back-end on OpenDS? that is not representative of real life!

Well, quite true, but the whole point in case here is to push DPS to the limit, not the backend, not the network or any other component of the system. So a null back-end is quite helpful here as it simulates a perfect back end that responds immediately to any requests DPS sends. Quite handy if you come to think of it because what you have as a result is an overview of the overhead of introducing DPS between your clients and your servers under heavy load. The load is actually all the hardware I had could take, the CPU is almost completely used, with idle time varying frantically between 1% and 7%. Keep in mind as well that DPS runs in a JVM and at these rate, garbage collections are almost constant. 

Here is how the set up looks like:

That's all I had for you guys tonight! 

Friday Aug 07, 2009

Rationale

After releasing the first version of the throughput throttling, most customers seemed interested in at least kicking the tires and wanted to evaluate it but as it turned out, the fact that throttling was choking traffic across the board, it would have actually required to deploy an extra instance of DPS in the topology for the sole purpose of choking traffic to an acceptable level to the business. While some felt it was simple enough to do, some found it to be a show stopper and therefore, I wrote this new plug-in, leveraging the distribution algorithm API to allow to narrow the scope of traffic throttling per data view, bringing a whole new dimension of flexibility to this feature.

Bird's Eye View

This new wad not only provides a new, more flexible throttling facility to your LDAP Proxy Server, it also comes with a CLI that makes it trivial to configure and change your settings on the fly. The README has some instructions to get you going in no time, but I will provide a quick overview here as well.

The Meat

First things first, you need to unzip the package, which will give your the following files:

$ find Throttling
Throttling
Throttling/throttleadm
Throttling/Throttling.jar
Throttling/README

As you can see, pretty trim.

The CLI will ease mainly 3 things:

  1. Setting up data views to be able to throttle traffic (throttleadm configure)
  2. Configuring the throughput limits to enforce (per data view and operation type, e.g.: throttleadm throttle root\ data\ view mod 200 )
  3. Checking the current configuration (throttleadm list)

Here is the complete usage for this little tool

$ ./throttleadm --help
Checking for binaries presence in path
.
.
.
.
throttleadm usage:
  throttleadm list
  throttleadm configure <dataViewName>
  throttleadm unconfigure <dataViewName>
  throttleadm choke <dataViewName>
  throttleadm unchoke <dataViewName>
  throttleadm throttle <dataViewName> <operation> <rate>
for example:
  To list the data views configured for throttling
  throttleadm list

  To set up a data view to use throttling
  throttleadm configure root\ data\ view

  To set up a data view back to its original state
  throttleadm unconfigure root\ data\ view

  To enable throttling on the default data view (provided the data view has been properly configured)
  throttleadm choke root\ data\ view

  To disable throttling on the default data view
  throttleadm unchoke root\ data\ view

  To change or set the maximum search throughput on the default data view to 20
  throttleadm throttle root\ data\ view search 20

Finally, when you change the settings, you can see them be enforced right away. In the example below, I initially have set the bind throughput limit to 200 per second. The left window has an authrate running and in the right window, while the authrate is running, I lower the throughput limit to 100 for 4 seconds and then set it back to 200. See how that works below:

Finally, here is a quick snap of the output of the CLI for the throttling status.

 $ ./throttleadm -D cn=dpsadmin -w /path/to/password-file list
Checking for binaries presence in path
Will now take action list with the following environmentals:
-host=localhost -port=7777 -user=cn=dpsadmin -password file=/path/to/password-file
VIEW NAME          - THROTTLED -  ADD - BIND -  CMP -  DEL -  MOD - MDN - SRCH
ds-view            -      true -   12 -  200 -   13 -   14 -    1 -  16 -  112

 That's it !

As usual, don't hesitate to come forth with questions, remarks, requests for enhancements.


Thursday Jun 25, 2009

Rationale

Access that can cause some performance focused users some discomfort. The one main thing usually making logs a performance hog is the fact that entries must be ordered somehow. I our products, the ordering is chronological. Here is an easy way to alleviate the issue if you're on Solaris and have a spare drive.

Bird's Eye View

ZFS Intent Log (or ZIL) can be configured on a separate disk to help synchronous performance.

You will find lots of literature  on the matter out there, including Neil and Brendan's blogs for example.

The Meat

So you heard about all the great benefits you can get with SSDs but don't have one yet (Go get one!) or don't have enough that you can dedicate one to your logs?

Worry not!

All you need to do is create a ramdisk drive that will be used for ZIL when we create our access-log-dedicated ZFS pool. Here's how:

$ ramdiskadm -a zil-drive 512m
$ zpool create log-pool c8d1 log /dev/ramdisk/zil-drive

For DPS, all you need to do is:

$ dpconf set-access-log-prop log-file-name:/log-pool/access

It's just as simple for DS, do:

$ dsconf set-log-prop access path:/log-pool/access 

And OpenDS is no more complicated to configure, do:

$ dsconfig -n set-log-publisher-prop --publisher-name "File-Based Access Logger" --set log-file:/log-pool/access

OR use the interactive command, simply do: 

$ dsconfig

and follow

  • 20) log publisher
  • 3)  View and edit an existing Log Publisher
  • 1)  File-Based Access Logger
  • 3)  log-file              logs/access
  • 2)  Change the value
  • type /log-pool/access, hit return
  • type "f" to finish and apply
  • restart OpenDS bin/stop-ds;bin/start-ds
  •  I know it looks like more work but the nice thing about dsconf is that it gives you context and you will get familiar with other aspects of the server


Caveats

    In the rare event that a server configured as described here loses power, the ZIL -being on a ramdisk- will be lost. This does not however corrupt the data stored on the disk and upon restart, all you would have to do is add the ZIL on a newly created ramdisk again. This can of course be automated to be done at boot time so that you do not need to do it yourself at every power cycle.

Rationale

    After discussing the article I posted yesterday with someone, they asked me: "What was the best performance you ever had with OpenDS?" and though I couldn't really answer off the top of my head, I dug in my archives from the last benchmark and found what I think was my best run so far.

Bird's Eye View

    To put it bluntly, about 120,000 operations per second @ <2ms. As this was done while I was tuning OpenDS for the 10 Million entries benchmark on Intel Nehalem-based Sun Blade x6270, I therefore had the whole setup, 10M entries, searches span across the entire DB and some of the Java tunings are bleeding edge, as I will detail in the next section.

The Meat

Environment

    As I said earlier, this is the same environment as described in my previous entry except for Java.

Java

    The JVM arguments are as follows: -d64 -server -Xbootclasspath/p:/path/to/SecretSauce.jar -XX:+UseNUMA -XX:LargePageSizeInBytes=2m -XX:+UseCompressedOops -XX:+AggressiveOpts -XX:+UseBiasedLocking -Xms6g -Xmx6g -Xmn4g -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancyFraction=85 -XX:MaxTenuringThreshold=1

    It's all pretty much business as usual but some of them call for explanation:

  • -Xbootclasspath/p:/path/to/SecretSauce.jar: One of our engineers, our lead for OpenDS core actually, has found a significant performance improvement in one of the JVM's core classes. This SecretSauce.jar contains his patched and improved version that overrides the JVM's own at run time. This makes a big difference in lowering GC pause times.
  • -XX:+UseNUMA: this is simply because the Sun Blade x6270 is a NUMA architecture and using this switch tells the JVM to be clever about memory and cache locality.
  • -XX:+UseCompressedOops: This allows to benefit of the 64-bit JVM larger Heap size, actually not quite as big but bigger than that of the 32-bit JVM while retaining 32-bit like performance. The best of both worlds. Works beautifully. And it is being improved ...

Results 

Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
35428814 116160.046 580800.230 10313.696 -0.037
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
60861940 35428857 1.718 580800.934 0.119 0.023

Caveats

    So, now that I told you all my secrets, you're wondering why I didn't use those settings for the benchmark? Because the benchmark is supposed to give me numbers on what could be achieved in a production environment, and in this case, using our patched JVM core class and a somewhat experimental or at least relatively new memory addressing mode of  the JVM isn't what I would advise to a customer about to go live.

All these bleeding edge settings only give us a 12% boost overall, I don't think it is worth the risk. But this shows that we are paving the way for an ever increasing performance on OpenDS. Tomorrow, these settings will all be well proven and safe for production. Just hang in there.

Wednesday Jun 24, 2009

Rationale

Long have we heard that the new Nehalem-based Sun x86 systems would bring significant performance boost over the AMD Opterons still ruling the land to this day. The whole idea of the test was to see in the particular case of Directory Services, and even more specifically of OpenDS, how this translated into throughput, response time and all the good things we (meaning the seriously loony LDAP geeks) like to look at...

Bird's Eye View

 On this single blade, OpenDS achieves over 93,000 search operations per second and over 17,000 modification operations per second. Under lighter -but still significant throughput always above 70,000k ops/sec- OpenDS delivers sub millisecond response time.

Sounds too good to be true? Then read further...

To sum it up as Benoit did in his post, this would give you, in a fully populated 6000 chassis, the ability to process almost A MILLION REQUESTS PER SECOND in a well integrated, highly available and easily manageable package. it does NOT get any better from any vendor out there as of today.

Special thanks to Benoit Chaffanjon and his team for making this equipment available to us on short notice. Their support, reactivity and in-depth knowledge of all things benchmark is what makes them top-notch and an indispensable component of our success.

The Meat

Maybe you have already heard about Benoit's previous benchmark of DSEE (6.3.1) on Nehalem. If you haven't, read it, it'll give all you the background you need to read these results here. I tried to stick as much as I could to his bench, and I think did a pretty good job at that. The main intentional difference between our two benches is that in his, searches only span across 1 Million entries among the 10 Million data base. In mine, searches span across the whole 10 Million entries. In practice, he's right to do his benchmarks the way he does, as it better reflects the reality of how most customers end up consuming data, but mine is more stressful on the system.

Setup

Hardware

Software

Tunings

Hardware

None

Software

Solaris
  • Cap the ZFS ARC size to ( SYSTEM MEMORY * 0.95 ) - OPENDS JVM HEAP SIZE
  • Disable ZFS cache flush since the storage takes care of that for us and has persistent cache (4GB of NVRAM)
  • Put ZFS ZIL on a dedicated SSD

Other things to consider doing:

    • use jumbo frames if returning whole entries, YMMV depending on your most frequent access patterns. I haven't tried this time around for lack of time but this should be interesting in reducing the network overhead. As we'll see later, OpenDS on this blade can max out a gigabit Ethernet connection.
Java

With very high volumes like we are seeing here, say above 80k ops/sec, you will likely want to bump request handlers and worker threads a notch to cope with the frenzy of the traffic. When you do so, the 32-bits JVM will quickly become too small no matter what tunings you try. Even though the 64-bits is not as space efficient for cache and all other aspects of memory access, it will provide an extremely stable environment for OpenDS even under heavy client traffic. I have been able to attach 10,000 hyper-clients (as in clients continuously sending traffic with no pause between requests) to OpenDS without a problem.

To cut to the chase, the settings:

OpenDS

 Worker Threads
32

 Connection Handlers

 16


As I have said previously, you may want to dial these values depending on a couple of factors:

  • How many clients you have at peak
  • How quickly your client applications open their connections (bursts or ramped up?)
  • How frantic a client is on each connection in average

If you have 5,00 clients opening 100 connections all at once, you will likely want to have more connection handlers to be able to cope with the suddenness of the pattern. This will however come at a performance cost (that we have yet to appropriately profile) under more normal circumstances.

If you have few frantic clients, these values will be right, you may want to bump up the number of worker threads a bit. This too is subobptimal  under normal circumstances.

Note: regardless of the access pattern, these settings will be adequate to serve whatever load you throw at the server, I'm only pointing out ways to improve the performance a bit. In particular, these advices will contribute to keeping the request backlog on a leash.

Import

Importing our 10M entries took 14'59", which averages at 11,120 entries per second.

Search Performance

These tests mainly aim at determining the maximum throughput that can be achieved. As such, they tend to load the servers to artificially high number of concurrent clients, inflating the response time compared to what can be expected under more normal production conditions... in the last section (Lighter Load), I will show what the response time looks like with lighter loads and lower overall throughput.

 Exact Search

 Return 1 Attribute
Heavy Load, Maximum Throughput 
Actual Duration
1839 seconds (30m 39s)
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
169056808 93660.281 468301.407 5590.951 -0.004
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
169056809 1.000 93660.282 468301.410 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
450590169 169056809 2.665 468301.410 0.189 -0.006
Lighter Load
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
28143684 92274.374 461371.869 3791.935 -0.040
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
28143684 1.000 92274.374 461371.869 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
30399915 28143685 1.080 461371.885 0.055 0.023

Return whole entry
Heavy Load, Maximum Throughput 
Actual Duration
1839 seconds (30m 39s)
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
151991059 84205.573 421027.864 5264.386 -0.006
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
151991061 1.000 84205.574 421027.870 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
360407639 151991065 2.371 421027.881 0.183 0.022

Lighter Load
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
21896817 71792.843 358964.213 4125.281 -0.020
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
21896817 1.000 71792.843 358964.213 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
15177289 21896817 0.693 358964.213 0.047 0.023

Sub Scope Search

Return 1 Attribute
Heavy load, Maximum Throughput 
Actual Duration
1838 seconds (30m 38s)
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
169252464 93768.678 468843.391 6339.082 -0.012
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
169252464 1.000 93768.678 468843.391 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
270122894 169252465 1.596 468843.393 0.140 0.022
Lighter Load
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
24902860 81648.721 408243.607 4020.767 -0.011
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
24902860 1.000 81648.721 408243.607 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
15166324 24902860 0.609 408243.607 0.039 0.023

Return Whole Entry
Heavy Load, Maximum Throughput 
Actual Duration
1839 seconds (30m 39s)
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
152888061 84702.527 423512.634 6003.399 -0.008
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
152888064 1.000 84702.529 423512.643 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
270188257 152888064 1.767 423512.643 0.154 0.013

Lighter Load
Searches Completed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
22151207 72626.908 363134.541 3680.320 -0.007
Exceptions Caught
Count Avg/Second Avg/Interval Std Dev Corr Coeff
0 0.000 0.000 0.000 0.000
Entries Returned
Total Avg Value Avg/Second Avg/Interval Std Dev Corr Coeff
22151207 1.000 72626.908 363134.541 0.000 0.000
Search Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
15179772 22151207 0.685 363134.541 0.041 0.023

Modifications Performance

Modifications Performed
Count Avg/Second Avg/Interval Std Dev Corr Coeff
15687496 17334.250 86671.249 2048.109 0.015
Modify Time (ms)
Total Duration Total Count Avg Duration Avg Count/Interval Std Dev Corr Coeff
126643779 15687499 8.073 86671.265 1.435 -0.201

Tuesday Jun 23, 2009

Rationale

    Many institutions, companies and organizations have security policies in place to keep security under control with an homogeneous environment. One of those guidelines mandates that no credentials be shared  between any two employees. When that is the case, cn=Directory Manager lays as what seems like a gaping hole in violation of such policies.

The other fact that bothers regulators with this user is that it is not subjected to Access Controls. It can therefore, by design, by-pass any carefully-designed access restriction policy. While this can sometimes be useful for performance reasons, this is incompatible with a quest of absolute security.

There are institutions where this is not tolerable.
Here is a painless way to stay compliant.

Bird'sEye View

    The idea behind this tip is to disable "cn=Directory Manager" knowing that a number of things are perfectible about this use, the main one being that one could run a brute force attack on it. Knowing the user name, which remains to its default more often than not, only makes things worse. So the number 1 thing would be to change the user name to some other value. But that would still allow brute force attacks.

The other thing that can be done is to null the directory manager password, which, combined with a mandatory password, effectively renders "cn=Directory Manager" unusable.

The Meat

  1. create a random password and store it in a file protected on the host to be readable only by root.
        e.g. store pd80wu709@w87-3WQJX%mjx097hc&50 in /path/to/cryptic-directory-manager-password.
        Note: Do not use echo or cat or anything of that sort as this could be sniffed. Use an editor like vi, joe or whatever is most convenient.
  2. create a random user. The only constraint is that it should be a valid DN - see rfc 2253 - and even that rule can be bent a bit...
       e.g. store tr-d7=9gcxf7tu in /path/to/cryptic-directory-manager-dn
       Note: take the same precautions as in step 1
  3. Never use the same user and password between any 2 instances of  Directory Server instances
      e.g. dsadm create -D `echo  /path/to/cryptic-directory-manager-dn` -w /path/to/cryptic-directory-manager-password -p xyz -P zyx </path/to>/instance
  4. delete the cryptic password file
  5. delete the cryptic dn file
  6. edit </path/to>/instance/config/dse.ldif and remove the value of the nsslapd-rootpw so that its contents are blank
    e.g.: nsslapd-rootpw:
  7. start the instance
    e.g. dsadm start </path/to>/instance


Your directory manager is effectively unusable and has little to no chance of having been compromised at any point of creating or starting the instance.[ if you really want absolute security, use a small program that will quietly output a randomly-generated password to file with 600 rights ]

Note that for an already created instance, you can simply do step 5 & 6, which is nice and easy. The only addition in that case is to check that the
require-bind-pwd-enabled property is on.
  e.g.
    $dsconf get-server-prop require-bind-pwd-enabled
    require-bind-pwd-enabled  :  on

Since at this point your directory manager is disabled, you will need to use an account like cn=admin,cn=Administrators,cn=config as your dsconf user.
simply export LDAP_ADMIN_USER=cn=admin,cn=Administrators,cn=config or use dsconf <command> -D cn=admin,cn=Administrators,cn=config ...

Caveats

    When following this procedure, you will end up with a server that only has "regular" users. This is mostly good but has a handful of shortcomings, such as not being able to repair ACIs ... since now all your users, including the administration accounts, are subjected to ACIs evaluation, you could end up in a state where all your administration accounts are locked. Care must be taken to keep an administration account with well calibered Access Controls. There also some additional troubleshooting operations that mandate (per the code) be done by directory manager.

This blog copyright 2009 by arnaud