The dot in ... --- ...

Chris Gerhard's Weblog

automount automounter blogging bonk books boot britain brompton camera camp camping club commuting cron ctc cycling cyclosportif dads disk dtrace dynamo firefox gaim good+morning+build goodmorningbuild gps grub helmetcam helmets highway_code holiday homeserver hub humour im iscsi keyboard korn ksh laptop law lights liveupgrade london media molesey-bbt nautilus nawk nevada nfs one-liner opensolaris panic planetcycling play politics rain randonee ray road samba santa scp scsi scsi.d service shared-shell shell smf snapshot snapshots solaris son south sportif ssh sun sun+ray suncec2007 sunray sunsolve sysadmin tour travel triplet ufs unix usb video western windows work zfs zigzag zones
Main | Next page »

20081014 Tuesday October 14, 2008

Bromptons on Eurostar

When taking a Brompton on Eurostar make sure you have a cover for it or they won't let you on the train. Quite why I don't know since I do have a cover and believe it or not could not be bothered to get into an argument with them.

Once covered they have no problem with them. The next gripe would be google maps. It does not offer routes for cyclists and at least in France does not show hills. Specifically the hill between the station and the Sun Office in Velizy.


( Oct 14 2008, 04:10:18 PM BST ) Permalink Comments [3] Trackback

    Slynkr This  

20081010 Friday October 10, 2008

Login to sunsolve just once a day

Go on. 7 ½ hours before you need to use SunSolve login and then just leave that tab alone until you need it. Why? Because you can!

As promised the horribly short idle time out has been increased from 30 minutes to 8 hours and the session time from 2 hours to 24.

Also I as have just been reminded it effects blogs.sun.com too, sweet.


( Oct 10 2008, 07:34:03 PM BST ) Permalink Comments [0] Trackback

    Slynkr This  
Gathering performance statistics with scsi.d

While scsi.d is good for looking at scsi packets and seeing those raw CDBs not many people are really interested in what a SCSI packet looks like, well not enough people if you ask me. However what is much more interesting is how long the scsi packets are taking. Now scsi.d tells you this for each packet but aggregating the data would be are more useful.

: e2big.eu TS 81 $; pfexec /usr/sbin/dtrace -Cs scsi.d -D QUIET -D PERF_REPORT -D REPORT_TARGET \
-D REPORT_LUN -n tick-1m {printa(@); clear(@); exit(0) }
Hit Control C to interrupt

  qus                                                       1
           value  ------------- Distribution ------------- count    
          131072 |                                         0        
          262144 |@@@@                                     25       
          524288 |@@@@@@@@@@@@                             68       
         1048576 |@@@@@@                                   34       
         2097152 |                                         2        
         4194304 |@@@                                      19       
         8388608 |@@@@@                                    29       
        16777216 |@@@@                                     22       
        33554432 |@@@@@@                                   35       
        67108864 |                                         1        
       134217728 |                                         0        

  fp                                                        2
           value  ------------- Distribution ------------- count    
          262144 |                                         0        
          524288 |                                         3        
         1048576 |                                         1        
         2097152 |@@                                       15       
         4194304 |@@@@@@@@                                 67       
         8388608 |@@@@@@@@@@                               81       
        16777216 |@@@@@@@@                                 65       
        33554432 |@@@@@@@@                                 66       
        67108864 |@@@                                      27       
       134217728 |                                         0        

  fp                                                        0
           value  ------------- Distribution ------------- count    
           65536 |                                         0        
          131072 |                                         27       
          262144 |@                                        485      
          524288 |@@@@@@                                   2901     
         1048576 |@@@@@                                    2203     
         2097152 |@@@@@@@                                  3204     
         4194304 |@@@@@@@@@                                4087     
         8388608 |@@@@@@@@                                 3978     
        16777216 |@@@                                      1606     
        33554432 |@                                        570      
        67108864 |                                         123      
       134217728 |                                         45       
       268435456 |                                         0        

  fp                                                        3
           value  ------------- Distribution ------------- count    
           65536 |                                         0        
          131072 |                                         41       
          262144 |@                                        493      
          524288 |@@@@@@                                   2926     
         1048576 |@@@@                                     2157     
         2097152 |@@@@@@                                   3228     
         4194304 |@@@@@@@@@                                4461     
         8388608 |@@@@@@@@@                                4561     
        16777216 |@@@                                      1634     
        33554432 |@                                        510      
        67108864 |                                         116      
       134217728 |                                         52       
       268435456 |                                         2        
       536870912 |                                         0        

  scsi_vhci                                                 0
           value  ------------- Distribution ------------- count    
          131072 |                                         0        
          262144 |@                                        588      
          524288 |@@@@@                                    4807     
         1048576 |@@@@@@                                   5423     
         2097152 |@@@@@@@                                  6609     
         4194304 |@@@@@@@@@                                8627     
         8388608 |@@@@@@@@@                                8641     
        16777216 |@@@                                      3289     
        33554432 |@                                        1088     
        67108864 |                                         239      
       134217728 |                                         97       
       268435456 |                                         2        
       536870912 |                                         0        


: e2big.eu TS 82 $; 

All the new options are supplied via -D flags to dtrace and they are:

Option Name

Description

QUIET

Be quiet. Don't report any packets seen. Useful when you only want a performance report.

PERF_REPORT

Produce a per HBA performance report when the script complete. The report is an aggregation held in @ so can be printed at regular intervals using a tick probe as in the above example but without the call to exit().

REPORT_TARGET

If producing a peformance report include the target to produce per target report.

REPORT_LUN

If producing a per target report then include the LUN to produce a per lun report.

DYNVARSIZE

Pass this value to the #pragma D option dynvarsize= option. Eg:

-D DYNVARSIZE=64m


The latest version of the script, version 1.15 is here: http://blogs.sun.com/chrisg/resource/scsi_d/scsi.d-1.15


( Oct 10 2008, 12:54:12 PM BST ) Permalink Comments [0] Trackback

    Slynkr This  

20081006 Monday October 06, 2008

Incremental back up of Windows XP to ZFS

I am forced to have a Windows system at home which thankfully only very occasionally gets used however even though everything that gets on it is virus scanned all email is scanned before it gets near it and none of the users are administrators I still like to keep it backed up.

Given I have a server on a network which has ZFS file systems with capacity I decided that I could do this just using the dd(1) command which I have written about before. Using that to copy the entire disk image to a ZFS file allows me to back the system up. However if I snapshot the back up file system and then back up again every block gets re written so takes up space on the server enven if they have not changed (roll on de dup). To stop this I have a tiny program that mmap()s the entire backup file and then only updates the blocks that have changed.

I call it syncer for no good reason:

#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <sys/mman.h>
#include <stdio.h>
#include <sys/time.h>
/*
 * Build by:
 *              cc -m64 -o syncer syncer.c
 */

/*
 * Match this to the file system record size.
 */
#define BLOCK_SIZE (128 * 1024)
#define KILO 1024
#define MEG (KILO * KILO)
#define MSEC (1000LL)
#define NSEC (MSEC * MSEC)
#define USEC (NSEC * MSEC)

static long block_size;

char *
map_file(const char *file)
{
        int fd;
        char *addr;
        struct stat buf;

        if ((fd = open(file, O_RDWR)) == -1) {
                return (NULL);
        }

        if (fstat(fd, &buf) == -1) {
                close(fd);
                return (NULL);
        }

        block_size = buf.st_blksize;

        addr = mmap(0, buf.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
        close(fd);
        return (addr);
}
off64_t
read_whole(int fd, char *buf, int len)
{
        int count;
        int total = 0;

        while (total != len && 
                (count = read(0, &buf[total], len - total)) > 0) {
                total+=count;
        }
        return (total);
}
static void
print_amount(char *str, off64_t value)
{
        if (value < KILO) {
                printf("%s %8lld ", str, value);
        } else if (value < MEG) {
                printf("%s %8lldK", str, value/(KILO));
        } else {
                printf("%s %8lldM", str, value/(MEG));
        }
}
int
main(int argc, char **argv)
{
        char *buf;
        off64_t offset = 0;
        off64_t update = 0;
        off64_t count;
        off64_t tcount = 0;
        char *addr;
        long bs;
        hrtime_t starttime;
        hrtime_t lasttime;

        if (argc == 1) {
                fprintf(stderr, "Usage: %s outfile\n", *argv);
                exit(1);
        }
        if ((addr = map_file(argv[1])) == NULL) {
                exit(1);
        }
        bs = block_size == 0 ? BLOCK_SIZE : block_size;
        if ((buf = malloc(block_size == 0 ? BLOCK_SIZE : block_size)) == NULL) {
                perror("malloc failed");
                exit(1);
        }

        print_amount("Block size:", bs);
        printf("\n");
        fflush(stdout);

        starttime = lasttime = gethrtime();
        while ((count = read_whole(0, buf, bs)) > 0) {
                hrtime_t thistime;
                if (memcmp(buf, addr+offset, count) != 0) {
                        memcpy(addr+offset, buf, count);
                        update+=count;
                }
                madvise(addr+offset, count, MADV_DONTNEED);
                offset+=count;
                madvise(addr+offset, bs, MADV_WILLNEED);
                thistime = gethrtime();
                /*
                 * Only update the output after a second so that is readable.
                 */
                if (thistime - lasttime > USEC) {
                        print_amount("checked", offset);
                        printf(" %4d M/sec ", ((hrtime_t)tcount * USEC) /
                                (MEG * (thistime - lasttime)));
                        print_amount(" updated", update);
                        printf("\r");
                        fflush(stdout);
                        lasttime = thistime;
                        tcount = 0;
                } else { 
                        tcount += count;
                }
        }
        printf("                                            \r");
        print_amount("Read: ", offset);
        printf(" %lld M/sec ", (offset * NSEC) /
                (MEG * ((gethrtime() - starttime)/MSEC)));
        print_amount("Updated:", update);
        printf("\n");
        /* If nothing is updated return false */
        exit(update == 0 ? 1 : 0);
}



Then a simple shell function to do the back up and then snapshot the file system:

function backuppc
{
	ssh -o Compression=no -c blowfish pc pfexec /usr/local/sbin/xp_backup | time ~/lang/c/syncer /tank/backup/pc/backup.dd && \
	pfexec /usr/sbin/zfs snapshot tank/backup/pc@$(date +%F)
}

Running it I see that only 2.5G of data was actually written to disk, and yet thanks to ZFS I have a complete disk image and have not lost the previous disk images.


: pearson FSS 17 $; backuppc
665804+0 records in
665804+0 records out
Read:     20481M 9 M/sec Updated:     2584M 

real    35m50.00s
user    6m27.98s
sys     2m43.76s
: pearson FSS 18 $; 

( Oct 06 2008, 06:31:57 PM BST ) Permalink Comments [2] Trackback

    Slynkr This  

20081003 Friday October 03, 2008

What is that system call doing?

Getting back on topic, here is a nice short bit of Dtrace.

Sometimes by the time I get to see an issue the “where on the object” question is well defined and in two recent cases that came down to “Why is system call X slow?” . The two system calls were not the same in each case but the bit of D to find the answer was almost identical in both cases.

Faced with a system call that is taking a long time you have to understand the three possible reasons this can happen:

  1. It has to do a lot of processing to achieve it's results.

  2. It blocks for a long time waiting for an asynchronous event to occur.

  3. It blocks for a short time but many times waiting for asynchronous events to occur.

So it would be really nice to be able to see where a system call is spending all it's time. The starting point for such an investigation is that when in the system call there are only two important states. The thread is either running on a CPU or it is not. Typically when it is not it is because it is blocked for some reason. So using the Dtrace sched provider's on-cpu and off-cpu probes to see how much time the system call spends blocked and then print out stacks if it is blocked for more than a given amount of time.

Here it is running against a simple mv(1) command:

$ pfexec /usr/sbin/dtrace -s syscall-time.d -c "mv .d .x"
dtrace: script 'syscall-time.d' matched 17 probes
dtrace: pid 26118 has exited
CPU     ID                    FUNCTION:NAME
  3  79751                     rename:entry rename(.d, .x)
  3  21381                    resume:on-cpu Off cpu for: 1980302
              genunix`cv_timedwait_sig+0x1c6
              rpcmod`clnt_cots_kcallit+0x55d
              nfs`nfs4_rfscall+0x3a9
              nfs`rfs4call+0xb7
              nfs`nfs4rename_persistent_fh+0x1eb
              nfs`nfs4rename+0x482
              nfs`nfs4_rename+0x89
              genunix`fop_rename+0xc2
              genunix`vn_renameat+0x2ab
              genunix`vn_rename+0x2b

  3  79752                    rename:return 
  on-cpu                                            
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           3        
           65536 |@@@@@@@@@@                               1        
          131072 |                                         0        

  off-cpu                                           
           value  ------------- Distribution ------------- count    
          131072 |                                         0        
          262144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              2        
          524288 |                                         0        
         1048576 |@@@@@@@@@@@@@                            1        
         2097152 |                                         0        

rename times on: 205680 off: 2625604 total: 2831284


$ 

From the aggregations at the bottom of the output you can see that the system call went off-cpu three times and one of those occasions it was off CPU for long enough that my limit of 10000000 nano seconds was reached and so a stack trace was printed. It also becomes pretty clear where that system call spent all it's time. It was a “rename” system call and I'm on an NFS file system so it has to wait for the server to respond and that server is going to have to make sure it has updated some non-volatile storage.


Here is the script:


#!/usr/sbin/dtrace -s
/* run using dtrace -p or dtace -c */
syscall::rename:entry
/ pid == $target /
{
	self->traceme = 1;
	self->ts = timestamp;
	self->on_cpu = timestamp;
	self->total_on = 0;
	self->total_off = 0;
	printf("rename(%s, %s)", copyinstr(arg0), copyinstr(arg1));
}
sched:::off-cpu
/ self->traceme == 1 /
{
	self->off_cpu = timestamp;
	self->total_on += self->off_cpu - self->on_cpu;
}
sched:::off-cpu
/ self->traceme == 1 /
{
	 @["on-cpu"] = quantize(self->off_cpu - self->on_cpu);
}
sched:::on-cpu
/ self->traceme == 1 /
{
	self->on_cpu = timestamp;
	 @["off-cpu"] = quantize(self->on_cpu - self->off_cpu);
	 self->total_off += self->on_cpu - self->off_cpu;
}
/* if off for more than a second print a stack */
sched:::on-cpu
/ self->traceme == 1 && timestamp - self->off_cpu > 1000*1000 /
{
	printf("Off cpu for: %d", self->on_cpu - self->off_cpu);
	stack(10);
}
sched:::off-cpu
/ self->traceme == 1 && timestamp - self->on_cpu > 1000*1000 /
{
	printf("On cpu for: %d", self->off_cpu - self->on_cpu);
	stack(10);
}
syscall::rename:return
/self->traceme/
{
	self->traceme = 0;
	self->total_on += timestamp - self->on_cpu;
	@["on-cpu"] = quantize(timestamp - self->on_cpu);
	printa(@);
	printf("%s times on: %d off: %d total: %d\n",probefunc, self->total_on,
	self->total_off, timestamp-self->ts);
	self->on_cpu = 0;
	self->off_cpu = 0;
	self->total_on = 0;
	self->total_off = 0;
}

( Oct 03 2008, 02:02:50 PM BST ) Permalink Comments [2] Trackback

    Slynkr This  

20081002 Thursday October 02, 2008

Blogs, wikis and email for announcements

I've been having a conversation with some colleagues about how to communicate announcements within Sun and rather than do this via email I thought I would go a bit off topic for this blog and post it here.

Why I think a blog is a good place to put announcements.

Email

Email provides a fantastic one to one or one to many communication medium but only if the “many” are all known to you. Sending out large announcements via email is likely to cause large numbers of the potential audience to ignore you. If you want this sort of broadcast medium I would suggest a blog.

Blogs

Blogs being largely write once and then read only are a great way to put out announcements so that they can be subscribed to by users using the blog's RSS feed and consumed either by a blog reader or even by thunderbird so that they look like emails. Or you can read them directly via the web or via a blog reading web site like google.com/reader. They will get indexed via search engines and so can be found by occasional readers while at the same time those who want to can subscribe to them to get the latest news or views when they are posted.

Wikis

There are wikis and there are wikis so to some degree the question as to when they are good and bad depends on the wiki. However I've not found a wiki yet that has a really good RSS feed for handling announcements. They are great for cooperative working or for community documentation (eg wikipedia) but for announcements they lack concise RSS feeds or notification methods that don't either suffer from too much noise (changing a single typo results in the entire page being in the RSS feed) or don't send enough updates (the feed is per page so having an announcement per page does not produce a good stream of announcements). Some wikis allow you to build complex RSS feeds based on search criteria that may allow a feed to be built but this is really for a power user and so I've not found it suitable for announcements.


( Oct 02 2008, 05:21:49 PM BST ) Permalink Comments [4] Trackback

    Slynkr This  

20080930 Tuesday September 30, 2008

Sunsolve et al Session timeouts increasing.

The hot news around here is that the session timeouts for Sunsolve and the other tools that use the authentication system on sun.com are going to be increased to something approaching reasonable timeouts. The current 30 minute idle and 2 hour session timeout will be increased to 8 hours idle and 24 hours for the session. Not quite the 14 days and 90 days I would have but none the less a welcome step in the right direction.

If all goes well the change should happen on October 9th. I wish it was sooner but none the less the prospect is exciting enough for me to pre-announce it here, not that anyone will read it!

A big thank you to those who are making it happen.


( Sep 30 2008, 04:19:06 PM BST ) Permalink Comments [2] Trackback

    Slynkr This  

20080924 Wednesday September 24, 2008

Decoding NFS v2 and v3 file handles.

This entry has been sitting in my draft queue for over a year mainly as it is no longer be relevant as NFSv4 should have rendered the script useless. The rest of this entry refers to NFSv2 and NFSv3 filehandles only.

How can you decode an NFS filehandle?

NFS file handles are opaque so only the server who hands them out can draw firm conclusions from them. However since the implementation in SunOS has not changed it is possible to write a script that will turn a file handle that has been handed out by a server running Solaris into an inode number and device. Hence way back when I wrote that script and only today someone made good use of it so here it is for everyone.

The script has not been touched in over 10 years until I added the CDDL but should still be able to understand messages files and snoop -v output and then decode the file handles.


This snoop was taken while accessing a the file “passwd” that was in /export/home on the server:


: s4u-10-gmp03.eu TS 19 $; /usr/sbin/snoop -p 3,3 -i /tmp/snoop.cg13442 -v |  decodefh | grep NFS
RPC:  Program = 100003 (NFS), version = 3, procedure = 4
NFS:  ----- Sun NFS -----
NFS:  
NFS:  Proc = 4 (Check access permission)
NFS:  File handle = [8CB2]
NFS:   0080000000000002000A000000019DAC03419521000A000000019DA96E637436
decodefh: SunOS NFS server file handle decodes as: maj=32,min=0, inode=105900
NFS:  Access bits = 0x0000002d
NFS:    .... ...1 = Read
NFS:    .... ..0. = (no lookup)
NFS:    .... .1.. = Modify
NFS:    .... 1... = Extend
NFS:    ...0 .... = (no delete)
NFS:    ..1. .... = Execute
NFS:  

Now taking this information to the server you need to find the file system that is shared and has major number 32 and minor number 0 and then look for the file with the inode number 105900 :


# share
-               /export/home   rw   ""  
# df /export/home
/                  (/dev/dsk/c0t0d0s0 ):13091934 blocks   894926 files
# ls -lL /dev/dsk/c0t0d0s0
brw-r-----   1 root     sys       32,  0 Aug 22 15:11 /dev/dsk/c0t0d0s0
# find /export/home -inum 105900
/export/home/passwd
# 

Clearly this is a trivial example but you get the idea.

The script also understands messages files:

$ grep 'nfs:.*702911' /var/adm/messages | head -2 | decodefh          
Sep 21 03:14:34 vi64-netrax4450a-gmp03 nfs: [ID 702911 kern.notice] (file handle: d41cd448 a3dd9683 a00 2040000 1000000 a00 2000000 2000000)
decodefh: SunOS NFS server file handle decodes as: maj=13575,min=54344, inode=33816576
Sep 21 08:34:11 vi64-netrax4450a-gmp03 nfs: [ID 702911 kern.notice] (file handle: d41cd448 a3dd9683 a00 2040000 1000000 a00 2000000 2000000)
decodefh: SunOS NFS server file handle decodes as: maj=13575,min=54344, inode=33816576
$ 

and finally can take the file handle from the command line:


$ decodefh 0080000000000002000A000000019DAC03419521000A000000019DA96E637436   
0080000000000002000A000000019DAC03419521000A000000019DA96E637436
decodefh: SunOS NFS server file handle decodes as: maj=32,min=0, inode=105900
$ 

So here is the script: http://blogs.sun.com/chrisg/resource/decodefh.sh

Remember this will only work for filehandles generated by NFS servers running Solaris and only for NFS versions 2 & 3. It is possible that the format could change in the future but at the time of writing and for the last 13 years it has been stable.


( Sep 24 2008, 07:06:28 AM BST ) Permalink Comments [1] Trackback

    Slynkr This  

20080923 Tuesday September 23, 2008

A first

Today the entire family left the house in the morning by Bike.

It does not get much better than that.


( Sep 23 2008, 05:40:42 PM BST ) Permalink Comments [3] Trackback

    Slynkr This  

20080921 Sunday September 21, 2008

More exciting than it should have been

We did a 63 mile round trip via Henfold lakes but out via a strange route taking in Ripley, Newlands Corner and then Cranleigh. When descending what is a very exciting hill towards Cranleigh doing 40mph I had the added thrill of hitting a large stone in the road and having my front tyre deflate instantly. As I braked as hard as I could using my back brake and slowed very slowly I was able to see the tyre not coming off the rim but also not doing much in the way of letting me steer around the bend that was approaching. The odd thing was what went through my mind was the question: ¨Am I using the rear brake?¨, which I was. Thankfully the rear brake was able to overcome the 1:7 hill and bring me to a halt before the tyre came off or I hit anything. A tribute to continental GP4000s ability to be ridden when flat.

When I went to put the wheel back on after fixing the tube the spring on the front brake decided to brake. I can´t really complain since it is 9 years old but it is the first time I have every had a brake spring fail (Campagnolo Record) I should be able to get a new spring if my local bike shop comes good with stocking Campagnolo spares, something they say they are going to do. The failure meant that if I used the front brake I had to manually spring the callipers apart for the rest of the ride. Not hard to do but enough to mean I wont be commuting on the bike again this year. I now have quite a large number of things to fix on my summer bike although not enough to let me upgrade to the new 11 speed Campagnolo group set, alas.

The rest of the ride was uneventful and we were able to take advantage of the Indain Summer we appear to be having.


( Sep 21 2008, 07:08:44 PM BST ) Permalink Comments [0] Trackback

    Slynkr This  

20080920 Saturday September 20, 2008

Office Advert

There is an amazing Advert running on local radio at the moment. The premise is that you don't need to waste money on expensive brand name trainers but to get the best out or education then you have to have Microsoft Office 2008. The irony of the add is that you would be wasting your money not on brand name trainers but instead on a“brand name” office suite.

Of course for infinitely less, yes free, you can have OpenOffice.org.

I know I've been here before but it is worth repeating.


( Sep 20 2008, 06:30:33 PM BST ) Permalink Comments [3] Trackback

    Slynkr This  

20080914 Sunday September 14, 2008

To the Devil's Punchbowl

Six riders good weather and a great cafe. We went out by what is not the usual route, via Old Woking, Normandy, Ash Green and The Sands. The return trip was south of Guildford via Albury, where we managed to loose two riders. One insisted we not wait for him and the other did not see us turn back towards Dorking so we could go up Coombe Bottom and so went up Newlands. Then even on the descent from Coombe bottom we got split up but managed to regroup as it turned out the slower riders were in front so were caught.

Ended up doing 79 miles and No RAIN!!!!!


( Sep 14 2008, 06:17:54 PM BST ) Permalink Trackback

    Slynkr This  

20080913 Saturday September 13, 2008

Native CIFS and samba on the same system

I've been using samba at home for a while and now but would like to migrate over to the new CIFS implentation provided by solaris. Since there are somre subtle differences in what each service provides* this means a slower migration.

Obviously you can't configure both services to run on the same system so to get around this I am going to migrate all the SMB services into a zone running on the server and then allow the global zone to act as the native CIFS service.

So I configured a zone called, rather dully, “samba” with loop back access to all the file systems that I share via SMB and added the additional priviledge “sys_smb” so that the daemons could bind to the smb service port.

zonecfg:samba> set limitpriv=default,sys_smb

The end command only makes sense in the resource scope.
zonecfg:samba> commit
zonecfg:samba> exit

Now you can configure the zone in the usual way to run samba. I simply copied the smb.conf and smbpasswd files from the global zone using zcp.


Once that was done and samba enabled in smf I could then enable the natives CIFS server in the global zone and have the best of both worlds.



*) The principal difference I see is that the native smb service does not cross file systems mount points. So if you have a hierarchy of file systems you have to mount each one on the client. With samba you can just mount the root and it will see everything below.


( Sep 13 2008, 05:25:24 PM BST ) Permalink Trackback

    Slynkr This  

20080910 Wednesday September 10, 2008

Why I won't have a career in sporting predictions

This morning I colleague, lets call him Adrian, popped round to ask me an important question:

Will Lance return to professional riding and ride in the Tour?

I said definitely not. He has just signed up of the doping tests so he can compete in the Leadville 100.....

An hour later another colleague sent me a URL via IM: http://news.bbc.co.uk/sport1/hi/other_sports/cycling/7605378.stm

It is going to be interesting.....but clearly I don't have a career in sports predictions.


( Sep 10 2008, 08:56:27 PM BST ) Permalink Trackback

    Slynkr This  

20080907 Sunday September 07, 2008

Upgrading disks

Having run out of space in the root file systems and being close to full on the zpool the final straw was being able to get 2 750Gb sata drives for less than £100, that and knowing that sanpshots no longer cause re livering to restart which greatly simplifies the data migration. So I'm replacing the existing drives with new ones. Since the enclosure I have can only hold three drives this involved a two stage upgrade so that at no point was my data on less than two drives. First stage was to install one drive and label it:

partition> print
Current partition table (unnamed):
Total disk cylinders available: 45597 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm   39383 - 41992       39.99GB    (2610/0/0)    83859300
  1 unassigned    wm       0                0         (0/0/0)              0
  2     backup    wu       0 - 45596      698.58GB    (45597/0/0) 1465031610
  3 unassigned    wm       0                0         (0/0/0)              0
  4 unassigned    wm   36773 - 39382       39.99GB    (2610/0/0)    83859300
  5 unassigned    wm   45594 - 45596       47.07MB    (3/0/0)          96390
  6 unassigned    wm   36379 - 36772        6.04GB    (394/0/0)     12659220
  7 unassigned    wm       3 - 36378      557.31GB    (36376/0/0) 1168760880
  8       boot    wu       0 -     0       15.69MB    (1/0/0)          32130
  9 alternates    wm       1 -     2       31.38MB    (2/0/0)          64260

partition>

These map to the partitions from the original set up, only they are bigger. I'm confident that when the 40Gb root disks are to small I will have migrated to ZFS for root. So this looks like a good long term solution.

pearson # dumpadm -d /dev/dsk/c2d0s6
      Dump content: kernel pages
       Dump device: /dev/dsk/c2d0s6 (dedicated)
Savecore directory: /var/crash/pearson
  Savecore enabled: yes
pearson # metadb -a -c 3 /dev/dsk/c2d0s5
pearson # egrep c2d0 /etc/lvm/md.tab
d12 1 1 /dev/dsk/c2d0s0
d42 1 1 /dev/dsk/c2d0s4
pearson # metainit d12
d12: Concat/Stripe is setup
pearson # metainit d42
d42: Concat/Stripe is setup
pearson # metattach d0 d12
d0: submirror d12 is attached
pearson # 

Now wait until the disk has completed resyning. While you can do this in parallel this causes the disk heads to move more so overall it is slower. Left to just do one partition at a time it is really quite quick:

                 extended device statistics                 
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
cmdk0   357.2    0.0 18321.8    0.0  2.6  1.1   10.4  52  58 
cmdk1     0.0  706.4    0.0 36147.4  1.0  0.5    2.2  23  27 
cmdk2   350.2    0.0 17929.6    0.0  0.4  0.3    2.1  12  15 
md1      70.0   71.0 35859.2 36371.5  0.0  1.0    7.1   0 100 
md3       0.0   71.0    0.0 36371.5  0.0  0.3    3.8   0  27 
md15     35.0    0.0 17929.6    0.0  0.0  0.6   16.5   0  58 
md18     35.0    0.0 17929.6    0.0  0.0  0.1    4.3   0  15 
pearson # metastat d0 
d0: Mirror
    Submirror 0: d10
      State: Okay         
    Submirror 1: d11
      State: Okay         
    Submirror 2: d12
      State: Resyncing    
    Resync in progress: 70 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 20482875 blocks (9.8 GB)

d10: Submirror of d0
    State: Okay         
    Size: 20482875 blocks (9.8 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c1d0s0          0     No            Okay   Yes 


d11: Submirror of d0
    State: Okay         
    Size: 20482875 blocks (9.8 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c5d0s0          0     No            Okay   Yes 


d12: Submirror of d0
    State: Resyncing    
    Size: 83859300 blocks (39 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c2d0s0          0     No            Okay   Yes 


Device Relocation Information:
Device   Reloc  Device ID
c1d0   Yes      id1,cmdk@AST3320620AS=____________3QF09GL1
c5d0   Yes      id1,cmdk@AST3320620AS=____________3QF0A1QD
c2d0   Yes      id1,cmdk@AST3750840AS=____________5QD36N5M
pearson # 

Once complete do the other root disk:

pearson # metattach d4 d42 
d4: submirror d42 is attached
pearson # 

Finally attach slice 7 to the zpool:

pearson # zpool attach -f tank c1d0s7 c2d0s7
pearson # zpool status
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.00% done, 252h52m to go
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0s7  ONLINE       0     0     0
            c5d0s7  ONLINE       0     0     0
            c2d0s7  ONLINE       0     0     0

errors: No known data errors
pearson # 

The initial estimate is more pessimistic than reality but it still took over 11hours to complete. The next thing was to shut the system down and replace one of the old drives with the new. Once this was done the final slices in use from the old drive can be detached and in the case of the meta devices cleared.

: pearson FSS 4 $; zpool status
  pool: tank
 state: ONLINE
 scrub: scrub completed after 11h8m with 0 errors on Sat Sep  6 20:58:05 2008
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c5d0s7  ONLINE       0     0     0
            c2d0s7  ONLINE       0     0     0

errors: No known data errors
: pearson FSS 5 $; 
: pearson FSS 5 $; metastat
d6: Mirror
    Submirror 0: d62
      State: Okay         
    Submirror 1: d63
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 12659220 blocks (6.0 GB)

d62: Submirror of d6
    State: Okay         
    Size: 12659220 blocks (6.0 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c5d0s6          0     No            Okay   Yes 


d63: Submirror of d6
    State: Okay         
    Size: 12659220 blocks (6.0 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c2d0s6          0     No            Okay   Yes 


d4: Mirror
    Submirror 0: d42
      State: Okay         
    Submirror 1: d43
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 83859300 blocks (39 GB)

d42: Submirror of d4
    State: Okay         
    Size: 83859300 blocks (39 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c5d0s4          0     No            Okay   Yes 


d43: Submirror of d4
    State: Okay         
    Size: 83859300 blocks (39 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c2d0s4          0     No            Okay   Yes 


d0: Mirror
    Submirror 0: d12
      State: Okay         
    Submirror 1: d13
      State: Okay         
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 83859300 blocks (39 GB)

d12: Submirror of d0
    State: Okay         
    Size: 83859300 blocks (39 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c5d0s0          0     No            Okay   Yes 


d13: Submirror of d0
    State: Okay         
    Size: 83859300 blocks (39 GB)
    Stripe 0:
        Device   Start Block  Dbase        State Reloc Hot Spare
        c2d0s0          0     No            Okay   Yes 


Device Relocation Information:
Device   Reloc  Device ID
c5d0   Yes      id1,cmdk@AST3750840AS=____________5QD36N5M
c2d0   Yes      id1,cmdk@AST3750840AS=____________5QD3EQEX
: pearson FSS 6 $; 

The old drive is still in the system but currently only has a metadb on it:

: pearson FSS 6 $; metadb -i
        flags           first blk       block count
     a m  p  luo        16              8192            /dev/dsk/c1d0s5
     a    p  luo        8208            8192            /dev/dsk/c1d0s5
     a    p  luo        16400           8192            /dev/dsk/c1d0s5
     a    p  luo        16              8192            /dev/dsk/c5d0s5
     a    p  luo        8208            8192            /dev/dsk/c5d0s5
     a    p  luo        16400           8192            /dev/dsk/c5d0s5
     a       luo        16              8192            /dev/dsk/c2d0s5
     a       luo        8208            8192            /dev/dsk/c2d0s5
     a       luo        16400           8192            /dev/dsk/c2d0s5
 r - replica does not have device relocation information
 o - replica active prior to last mddb configuration change
 u - replica is up to date
 l - locator for this replica was read successfully
 c - replica's location was in /etc/lvm/mddb.cf
 p - replica's location was patched in kernel
 m - replica is master, this is replica selected as input
 t - tagged data is associated with the replica
 W - replica has device write errors
 a - replica is active, commits are occurring to this replica
 M - replica had problem with master blocks
 D - replica had problem with data blocks
 F - replica had format problems
 S - replica is too small to hold current data base
 R - replica had device read errors
 B - tagged data associated with the replica is not valid
: pearson FSS 7 $; 

I'm tempted to leave the third disk in the system so that the disk suite configuration will always have a quorum if a single drive files. However since the BIOS only seems to be able to boot from the first disk drive this may be pointless.

I'm now keenly interested in bug 6592835 “resliver needs to go fasterâ€ since if a disk did fail I don't fancy waiting more than 24hours after I have sourced a new drive for the data to sync when the disks fill. The disk suite devices managed to drive the disk at over 40Mb/sec while ZFS achieved 5Mb/sec.


( Sep 07 2008, 09:37:39 PM BST ) Permalink Comments [1] Trackback

    Slynkr This  

20080831 Sunday August 31, 2008

Laughing in the face of weather forecasters....

There were six in Molesey this morning braving the fog and laughing in the face of the weather forecast. Those six made it to Epsom before the fog cleared to be replaced by the thunder and heavy rain that had been forecast. The idea of climbing and riding over Epsom downs in a Thunder storm did not fill anyone with joy. So after briefly taking shelter in Epsom and failing to find a café open we rode back as fast as our legs would allow. If the people at the Met Office could have seen us they would have been in stitches.

Got home very very wet having ridden 23 miles.

Next weekend we are supposed to be cycling to Yeovil which is about 120 miles, I hope the weather is better.


( Aug 31 2008, 01:55:24 PM BST ) Permalink Trackback

    Slynkr This  

20080830 Saturday August 30, 2008

Chainrings made of cheese

I rode my blue bike to work this week for two reasons. First I needed to carry more things home than usual and so had a pannier and second to make sure it was all ready for the winter. It turns out that all is not well. Having replaced the chain and cassette at the end of the winter and now I know that the Shimano Ultegra chain rings are only marginally more hard wearing than the 105 rings.

Prior to the bike's last major rebuild the 105 rings had only lasted one winter (3000 miles or so) and were replaced with TA rings. When the whole chain set was replaced it had Ultegra rings on so I left them. The Ultegra rings have lasted three winters (about 10,000 miles), not that bad until you compare the my summer bike. It is 9 years old and has done over 36,000 miles and still has the same chain rings (Campagnolo Record). The maintenance schedule of both bikes is the same, new chain and cassette after 3,500 miles. Yes the summer bike is lighter and does not go out in the winter but I still don't think that explains the drastic difference in the wear. I can't help agree with those that claim Shimano chain rings are made of grey cheese.

Luckily I have a spare set of chain rings, Stronglight ones, that I spent a happy hour fitting.


( Aug 30 2008, 06:48:41 PM BST ) Permalink Comments [1] Trackback

    Slynkr This  

20080828 Thursday August 28, 2008

It scrubbed up good

Since the home server has been snapping regularly I have had to choose between snapshots and scrubbing and I chose snapshots. User error is more likely than hardware failures and scrubbing is really about seeing those errors sooner so you don't get a unrecoverable failure due to having two problems at once. However I would rather not have to choose.

So I was particularly pleased to see that build 94 contains the fix for this bug:

6343667 scrub/resilver has to start over when a snapshot is taken

So today the home server had it´s first scrub in years and it scrubbed up well:

: pearson FSS 5 $; pfexec zpool status
  pool: tank
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: scrub completed after 12h42m with 0 errors on Thu Aug 28 20:12:36 2008
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0s7  ONLINE       0     0     0
            c5d0s7  ONLINE       0     0     0

errors: No known data errors
: pearson FSS 6 $; 

When I upgrade the pool, after the other live upgrade boot environment can support this pool version, there is the promise of a faster scrub but since this scrub happened during the day and I also backed up the pool using zfs_backup during the same time.