« February 2007 »
SunMonTueWedThuFriSat
    
1
3
4
5
8
9
10
13
15
20
21
22
23
25
26
27
28
   
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20070224 Saturday February 24, 2007
Dusting off ElHam

ElHam is a filesystem testing tool designed to detect corruption, be multiprotocol, and stress a filesystem. It isn't designed to be a benchmark. I'm in the middle of debugging a real nasty NFSv4 bug, read that to mean we haven't a real clue as to what is going on or how to reproduce it, and I need to generate sufficient load on a test system.

So I went and got ElHam from SourceForge.net. I wrote it when I was at NetApp as a tool we could use internally to get multiprotocol lock testing, generate metadir traffic, and to hand out to customers for corruption testing. As such, we stuck a BSD license on it and hung it off of SourceForge.net.

It still needs work done on it - for example, I figured out that it wasn't detecting big endianess. I also have to make a pass through it and make sure that I capture all returns from function calls and check that they are valid. One of the things you need for corruption testing is early detection of problems.

Sometimes in trying to detect corruption, you can get a false positive because of client side caching. If your focus is strictly on the server, i.e., you are testing a filer, that is bad. So you might be tempted to turn off client side caching. It also appears to go faster, but again, ElHam is not designed to be a benchmark.

The other evil with turning off client side caching is that it effectively negates both locking in general and NFSv4 delegations. ElHam is designed to have multiple readers and writers, both local and remote, changing files in a directory tree. Client side caching issues are something it should have to live with.

Anyway, multiple instances (from different architectures and OSes) are possible because ElHam records what is supposed to be in every data block. So when another instance comes along, it is able to compute what should be in the data block and then it can see if the on-disk image is corrupt. I need to write a small application to inject corruption - this will help me get signatures to show people what ElHam has detected.

The current big issue is that ElHam is designed to push a filesystem to capacity and back off. I.e., reads and writes in the face of a full filesystem are interesting. To aid in that testing, it is best that the 'data', 'meta', and 'history' (see ElHam docs) directories each be on a different filesystem. Well yesterday I had all three on the same filesystem and it got full. So I'm trying to reproduce that and see what is happening.

A really neat way to do this is to use ZFS to create different filesystems and then set quotas to control how much space each filesystem is allowed:

# zfs create zoo/elham 
# zfs set sharenfs=on zoo/elham
# zfs create zoo/elham/data
# zfs create zoo/elham/meta
# zfs create zoo/elham/history
# zfs list zoo/elham/*
NAME                USED  AVAIL  REFER  MOUNTPOINT
zoo/elham/data     36.7K   654G  36.7K  /zoo/elham/data
zoo/elham/history  36.7K   654G  36.7K  /zoo/elham/history
zoo/elham/meta     36.7K   654G  36.7K  /zoo/elham/meta
# zfs set quota=2G zoo/elham/data
# zfs set quota=20G zoo/elham/meta
# zfs set quota=20G zoo/elham/history
# zfs list zoo/elham/*
NAME                USED  AVAIL  REFER  MOUNTPOINT
zoo/elham/data     36.7K  2.00G  36.7K  /zoo/elham/data
zoo/elham/history  36.7K  20.0G  36.7K  /zoo/elham/history
zoo/elham/meta     36.7K  20.0G  36.7K  /zoo/elham/meta

Note that I give the 'history' and 'meta' filesystems much more of a quota. I don't want to run out of space on them.

I'm going to kick off several instances of ElHam and see if I can fill this puppy up.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily