« December 2009
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20090731 Friday July 31, 2009
Bug in remove code

We've got a bug in our remove code - I didn't introduce it in my recent set of changes, we can just say it has always been there. The bug is that when we remove a file, we eventually fill up the dataset on the DSes.

I'm going to want to fix it, mainly because I want to get my head around how we are going to examine files on the DSes.

By this, I mean that the inode information lives on the MDS. You can browse the directory structure on it and see what files are there. You can't do that on the DS. The DS may have the concept of an inode, but it doesn't care at all about directory structure, file names, etc. The first file may have a layout of 4 and the second a layout of 8. And it may be the case that those files only intersect on this dataset.

To find dangling files, conceptually we need to run scanners on both the MDS and the DSes. For each file file on the MDS, we can look at the odl and check to see if there is a corresponding file on the DS. On the DS, we have to make sure that the MDS has that file existing and the dataset is in the corresponding layout.

Scratch that, we just need a scanner on the DS - I'm starting to describe a parallelized fsck. So, on the DS, we scan the datasets and for each file we find, we check with the MDS:

  1. File does not exist on MDS - means we have garbage to clean up. (Or it could mean that we have inode corruption on the MDS.)
  2. File does exist on the MDS:
    1. Layout contains this dataset - a valid file.
    2. Layout does not contain this dataset - most likely a reuse of the inode number and it inidicates we have garbage from a prior generation of the file.

A colleague just pointed out 6792701 Removing large holey file does not free space which I like as a culprit for a couple of reasons:

  1. The remove code on our part is pretty simple.
  2. The gate is at onnv_117 and the fix is in onnv_118.

But I'm not sure that they are holey. Our server sets NFL4_UFLG_DENSE, which should inform the clients to not create holes.

In any event, the bug will give me some clues as to how to triage the issue.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
More memory leaks, but this time on the DS

Okay, my fix for the memory leaks on the MDS went fine. Now I've got a problem on the DS:

> ::findleaks
CACHE             LEAKED           BUFCTL CALLER
ffffff01c682a860       1 ffffff01d49a0068 dserv_mds_do_reportavail+0x210
ffffff01c68262e0  520364 ffffff01ea054458 dserv_nnode_from_fh_ds+0x5b
ffffff01c68262e0 1258470 ffffff01f3ed5db0 dserv_nnode_from_fh_ds+0x5b
ffffff01c68262e0 1343031 ffffff01f3ed5e88 dserv_nnode_from_fh_ds+0x77
ffffff01c68262e0  435756 ffffff01ee4b65f8 dserv_nnode_from_fh_ds+0x77
ffffff01c68262e0       1 ffffff01de76e558 mds_compound+0x54
ffffff01c68262e0       3 ffffff01e83df668 mds_compound+0x54
ffffff01c682b2e0       4 ffffff01d99cea30 mds_compound+0x193
ffffff01c6828020       2 ffffff01dcd9f8f8 mds_get_server_impl_id+0x30
ffffff01c68262e0       2 ffffff01dd4350d8 mds_get_server_impl_id+0x58
ffffff01c6826b20       2 ffffff01d7153290 mds_get_server_impl_id+0x8a
ffffff01c6828860       1 ffffff01d8805120 modinstall+0x129
ffffff01c6828860       1 ffffff01d7fe7a98 modinstall+0x129
ffffff01c682b2e0       1 ffffff01ddd5a870 modinstall+0x129
ffffff01c6828020       1 ffffff0f82dd4b68 rpc_init_taglist+0x25
ffffff01c6828020   20145 ffffff020b8337a0 rpc_init_taglist+0x25
ffffff01c6828020       1 ffffff01d613e538 rpc_init_taglist+0x25
ffffff01c6828020   11460 ffffff01e20b1de8 rpc_init_taglist+0x25
ffffff01c6828020       1 ffffff0f82de32e0 rpc_init_taglist+0x25
ffffff01c6828020       1 ffffff0f82de1708 rpc_init_taglist+0x25
ffffff01c6828020       2 ffffff01ddda3018 rpc_init_taglist+0x25
ffffff01c68265a0       2 ffffff01dc8f41b0 tohex+0x32
ffffff01c6828020       3 ffffff01dcda06c0 xdr_array+0xae
ffffff01c6828020       1 ffffff01df04e7f8 xdr_array+0xae
ffffff01c68262e0 1083945 ffffff01edde37b0 xdr_bytes+0x70
ffffff01c68282e0       1 ffffff01f31f9e60 xdr_bytes+0x70
ffffff01c68285a0       1 ffffff01e1798598 xdr_bytes+0x70
ffffff01c68262e0  694948 ffffff01f3ed5c00 xdr_bytes+0x70
------------------------------------------------------------------------
           Total 5368151 buffers, 86904136 bytes

Even though we still see a high number of leaks in xdr_bytes(), which we expect to indicate that the mds_sid is not getting cleaned up, I'll argue that this is to be expected with the high number of leaks in dserv_nnode_from_fh_ds(). I.e., if the file handle is not being nuked, we can't expect the mds_sid component to also be zapped.

> ffffff01ea054458$<bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
ffffff01ea054458 ffffff01e9f71d50      89f9dc8f5e7 ffffff01d7aa93a0
                 ffffff01c68262e0 ffffff01c747f9c0 ffffff01cfd78cd8
                 kmem_cache_alloc_debug+0x283
                 kmem_cache_alloc+0xa9
                 kmem_alloc+0xa3
                 dserv_nnode_from_fh_ds+0x5b
                 nnode_from_fh_v41+0x45
                 mds_op_putfh+0xe3
                 rfs41_op_dispatch+0x72
                 mds_compound+0x1d4
                 rfs41_dispatch+0x17c
                 dispatch_dserv_nfsv41+0x93
                 svc_getreq+0x20d
                 svc_run+0x197
                 svc_do_run+0x81
                 nfssys+0xa0e

The issue is actually very simple to understand - prior to my changes, the dnk_sid field was not being used. But that field is a mds_sid, which requires a memory allocation to use:

 426 static nnode_error_t
 427 dserv_nnode_from_fh_ds(nnode_t **npp, mds_ds_fh *fhp)
 428 {
 429         dserv_nnode_key_t dskey;
 430         nnode_key_t key;
 431         uint32_t hash;
 432 
 433         if (fhp->vers < 1)
 434                 return (ESTALE); /* XXX badhandle */
 435         if (fhp->fh.v1.mds_fid.len < 8) /* XXX stupid */
 436                 return (ESTALE);
 437         if (fhp->fh.v1.mds_fid.len > DS_MAXFIDSZ)
 438                 return (ESTALE);
 439         /* XXX cannot do sid until mds_sid is sane */
 440         dskey.dnk_sid = NULL;
 441         dskey.dnk_fid = &fhp->fh.v1.mds_fid;
 442 
 443         hash = dserv_nnode_hash(&dskey);
 444 
 445         key.nk_keydata = &dskey;
 446         key.nk_compare = dserv_nnode_compare;
 447 
 448         return (nnode_find_or_create(npp, &key, hash, fhp, dserv_nnode_build));
 449 }

I changed this to be:


 425 static nnode_error_t
 426 dserv_nnode_from_fh_ds(nnode_t **npp, mds_ds_fh *fhp)
 427 {
 428         dserv_nnode_key_t dskey;
 429         nnode_key_t key;
 430         uint32_t hash;
 431 
 432         if (fhp->vers < 1)
 433                 return (ESTALE); /* XXX badhandle */
 434         if (fhp->fh.v1.mds_fid.len < 8) /* XXX stupid */
 435                 return (ESTALE);
 436         if (fhp->fh.v1.mds_fid.len > DS_MAXFIDSZ)
 437                 return (ESTALE);
 438 
 439         /*
 440          * XXX: Should we be mallocing in the middle of a
 441          * cache? Cache entry does not have enough space.
 442          * What is the use of the mds_sid here?
 443          */
 444         dskey.dnk_sid = kmem_alloc(sizeof (mds_sid), KM_SLEEP);
 445         dskey.dnk_sid->len = fhp->fh.v1.mds_sid.len;
 446         dskey.dnk_sid->val = kmem_alloc(dskey.dnk_sid->len,
 447             KM_SLEEP);
 448 
 449         bcopy(fhp->fh.v1.mds_sid.val, dskey.dnk_sid->val,
 450             dskey.dnk_sid->len);
 451 
 452         dskey.dnk_fid = &fhp->fh.v1.mds_fid;
 453 
 454         hash = dserv_nnode_hash(&dskey);
 455 
 456         key.nk_keydata = &dskey;
 457         key.nk_compare = dserv_nnode_compare;
 458 
 459         return (nnode_find_or_create(npp, &key, hash, fhp, dserv_nnode_build));
 460 }

The problem here is that both dskey and key are pulled off of the stack. We don't expect them to have a life after dserv_nnode_from_fh_ds() and before my changes, there was no need to release any memory.

The key is only used here in the find case:

637         rw_enter(&bucket->nb_lock, rw);
638         found = avl_find(&bucket->nb_tree, &key, &where);
639         if (found) {

And indeed, we can easily follow where the compare function has to be from line 457 above:


 310 static int
 311 dserv_nnode_compare(const void *va, const void *vb)
 312 {
 313         const dserv_nnode_key_t *a = va;
 314         const dserv_nnode_key_t *b = vb;
 315         int rc;
 316 
 317         NFS_AVL_COMPARE(a->dnk_sid->len, b->dnk_sid->len);
 318         rc = memcmp(a->dnk_sid->val, b->dnk_sid->val, a->dnk_sid->len);
 319         NFS_AVL_RETURN(rc);
 320 
 321         NFS_AVL_COMPARE(a->dnk_fid->len, b->dnk_fid->len);
 322         rc = memcmp(a->dnk_fid->val, b->dnk_fid->val, a->dnk_fid->len);
 323         NFS_AVL_RETURN(rc);
 324 
 325         return (0);
 326 }

And I added that code on 317-319!

Edit Note: And I just realized, I'm supplying only one of the two keys. Where does the other come from? It turns out that it does get created in:

static nnode_error_t
dserv_nnode_build(nnode_seed_t *seed, void *vfh)
{
        dserv_nnode_key_t *key = NULL;
...
        key->dnk_sid = kmem_alloc(sizeof (mds_sid), KM_SLEEP);
        key->dnk_sid->len = fhp->fh.v1.mds_sid.len;
        key->dnk_sid->val = kmem_alloc(key->dnk_sid->len,
            KM_SLEEP);

And that code does take care of releasing the memory correctly with dserv_nnode_key_free()!

It is not used in the create case:

674         rc = nnode_build(npp, data, nnbuild);
675         if (rc == 0)
676                 avl_insert(&bucket->nb_tree, *npp, where);

So the fix will be:

 457         ne = nnode_find_or_create(npp, &key, hash, fhp, dserv_nnode_build);
 458          
 459         kmem_free(dskey.dnk_sid->val, dskey.dnk_sid->len);
 460         kmem_free(dskey.dnk_sid, sizeof (mds_sid));
 461  
 462         return (ne);
 463 }

Ouch, I thought I was done! Remember earlier I stated that the xdr_bytes would be from this code? Well I decided to challenge that statement:

> ffffff01edde37b0$<bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
ffffff01edde37b0 ffffff01edc60168      89f9c4733bc ffffff01d7aa93a0
                 ffffff01c68262e0 ffffff01c9a25b80 ffffff01cca20030
                 kmem_cache_alloc_debug+0x283
                 kmem_cache_alloc+0xa9
                 kmem_alloc+0xa3
                 xdr_bytes+0x70
                 xdr_mds_sid+0x21
                 xdr_ds_fh_v1+0x68
                 xdr_ds_fh+0x3f
                 xdr_decode_nfs41_fh+0xdd
                 xdr_snfs_argop4+0x5e
                 xdr_COMPOUND4args_srv+0xf4
                 svc_authany_wrap+0x22
                 svc_cots_kgetargs+0x41
                 dispatch_dserv_nfsv41+0x5d
                 svc_getreq+0x20d
                 svc_run+0x197

In retrospect, "D'oh!", we can see we are not decoding the xdr here. But then, in my defense, I made that statement before we looked at the code!

Ahh, looks like we have some hand written XDR code:

4034 static bool_t
4035 xdr_snfs_argop4_free(XDR *xdrs, nfs_argop4 **arrayp, int len)
4036 {
...
4056                 case OP_PUTFH:
4057                         if (array[i].nfs_argop4_u.opputfh.object.nfs_fh4_val !=
4058                             NULL) {
4059                                 kmem_free(array[i].nfs_argop4_u.opputfh.object.
4060                                     nfs_fh4_val,
4061                                     array[i].nfs_argop4_u.opputfh.object.
4062                                     nfs_fh4_len);
4063                         }

Which does not free the mds_sid portion of the file handle. This hand encoding is a nightmare to unravel. I'm going to have to pass in the minorversion:

5022                                 (void) xdr_snfs_argop4_free(xdrs, &objp->array,
5023                                     objp->array_len, objp->minorversion);

to determine whether or not to check for the mds_sid. If it is a minorversion, then I'll need to check whether it is a normal NFSv41 filehandle or a DS filehandle:


4057                 case OP_PUTFH:
4058                         if (array[i].nfs_argop4_u.opputfh.object.nfs_fh4_val ==
4059                             NULL)
4060                                 continue;
4061 
4062                         if (minorversion != 0) {
4063                                 struct mds_ds_fh        *dsfh =
4064                                     (struct mds_ds_fh *)array[i].nfs_argop4_u.
4065                                     opputfh.object.nfs_fh4_val;
4066                                     
4067                                 /*  
4068                                  * Is it really a DS filehandle?
4069                                  */
4070                                 if (fh4->type == FH41_TYPE_DMU_DS) {
4071                                         mds_sid *sid = &dsfh->fh.v1.mds_sid;
4072                                         if (sid->val) {
4073                                                 kmem_free(sid->val, sid->len);
4074                                         }
4075                                 }
4076                         }
4077                         
4078                         kmem_free(array[i].nfs_argop4_u.opputfh.object.
4079                             nfs_fh4_val,
4080                             array[i].nfs_argop4_u.opputfh.object.
4081                             nfs_fh4_len);

The key to understanding this code (and the headache) is to look at the definitions of our filehandles:

typedef struct {
        nfs41_fh_type_t type;
        nfs41_fh_vers_t vers;
        union {
                nfs41_fh_v1_t v1;
                /* new versions will be added here */
        } fh;
} nfs41_fh_fmt_t;

struct mds_ds_fh {
        nfs41_fh_type_t type;           /* MDS or DS? */
        ds_fh_version vers;             /* OTW version number */
        union {
                ds_fh_v1 v1;
                /* new versions will be added here */
        } fh;
};
typedef struct mds_ds_fh mds_ds_fh;

You can typecast nfs_fh4_val in several ways, i.e.:

4063                                 struct mds_ds_fh        *dsfh =
4064                                     (struct mds_ds_fh *)array[i].nfs_argop4_u.
4065                                     opputfh.object.nfs_fh4_val;

but you always have to check the type to make sure it is valid. The key to understanding this comes out of nfs41_filehandle_xdr.c:

132 bool_t
133 xdr_decode_nfs41_fh(XDR *xdrs, nfs_fh4 *objp)
134 {
135 
136         uint_t otw_len;
137         uint_t type;
138 
139         ASSERT(xdrs->x_op == XDR_DECODE);
140 
141         objp->nfs_fh4_val = NULL;
142         objp->nfs_fh4_len = 0;
...
150         /* Get the filehandle type */
151         if (!xdr_enum(xdrs, (enum_t *)&type))
152                 return (FALSE);
153 
154         switch (type) {
155         case FH41_TYPE_NFS: {
156                 nfs41_fh_fmt_t *nfhp = NULL;
157 
158                 nfhp = kmem_zalloc(sizeof (nfs41_fh_fmt_t), KM_SLEEP);
159                 nfhp->type = FH41_TYPE_NFS;
...
170         case FH41_TYPE_DMU_DS: {
171                 struct mds_ds_fh *dfhp = NULL;
172 
173                 dfhp = kmem_zalloc(sizeof (struct mds_ds_fh), KM_SLEEP);
174                 dfhp->type = FH41_TYPE_DMU_DS;

Since both structures have the type as the first field, after the typecast, we will be able to determine if that was valid to do.

So the first memory leak I fixed was one I had added. The second one was a pre-existing one. Running findleaks has saved my bacon here.

And of course, I'm going to have to check one more time to make sure that my fixes plugged the leaks!


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

20090730 Thursday July 30, 2009
Update spe implementation notes

I've update the spe implementation notes to version 1.1. The big changes from the version 1.0 from last year are:

  1. Policies use npools
  2. /etc/npool.spe to store the npool definitions
  3. Use of mds_sids and not guuids to describe datasets
  4. Uber-default policy, which is to construct a layout of all of the currently reporting datasets.

Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
Checking for memory leaks in the MDS

One of the things I've been doing is spamming the MDS with file creates and deletes. I've been doing it for a day and now I want to see if the new code is leaking memory:

> ::findleaks    
CACHE             LEAKED           BUFCTL CALLER
ffffff01c68262e0       1 ffffff0203ff9e70 mds_compound+0x54
ffffff01c68262e0       3 ffffff01eb07ada0 mds_compound+0x54
ffffff01c682b2e0       3 ffffff01de7f1818 mds_compound+0x193
ffffff01c6828020       1 ffffff020136ba50 mds_get_server_impl_id+0x30
ffffff01c68262e0       1 ffffff0203ff9a38 mds_get_server_impl_id+0x58
ffffff01c6826b20       1 ffffff01eb624ca8 mds_get_server_impl_id+0x8a
ffffff01c6828860       1 ffffff01e3fa6898 modinstall+0x129
ffffff01c6828860       1 ffffff01d82cd960 modinstall+0x129
ffffff01c6828860       1 ffffff01d61493a0 modinstall+0x129
ffffff01c682b2e0       1 ffffff02f1f72da8 modinstall+0x129
ffffff01c6828860       1 ffffff01e3fa6970 modinstall+0x129
ffffff01c682b2e0       1 ffffff01de7f13e0 modinstall+0x129
ffffff01c6828020       1 ffffff01fed32cd8 rpc_init_taglist+0x25
ffffff01c6828020       2 ffffff01e6bb68c0 rpc_init_taglist+0x25
ffffff01c6828020       1 ffffff020136bdb0 rpc_init_taglist+0x25
ffffff01c6828020    1311 ffffff01d742e4a0 rpc_init_taglist+0x25
ffffff01c68265a0       1 ffffff01f6a1ecb0 tohex+0x32
ffffff01c6828020       2 ffffff020136b390 xdr_array+0xae
ffffff01c68285a0       1 ffffff01d67cc068 xdr_bytes+0x70
ffffff01c68262e0 1154050 ffffff01eb07c7c8 xdr_bytes+0x70
ffffff01c68282e0       1 ffffff0200f380e0 xdr_bytes+0x70
ffffff01c68262e0 3979053 ffffff01f9f569b0 xdr_bytes+0x70
------------------------------------------------------------------------
           Total 5134439 buffers, 82195096 bytes

I knew about the rpc_init_taglist() from before my changes, so that leaves the xdr_bytes, and yes, those are leaking:

> ffffff01f9f569b0$<bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
ffffff01f9f569b0 ffffff01fa04fc68     2a7a9936fecc ffffff01d84574e0
                 ffffff01c68262e0 ffffff01c70614c0                0
                 kmem_cache_alloc_debug+0x283
                 kmem_cache_alloc+0x164
                 kmem_alloc+0xa3
                 xdr_bytes+0x70
                 xdr_mds_sid+0x21
                 xdr_ds_fh_v1+0x68
                 xdr_ds_fh+0x3f
                 xdr_ds_fh_fmt+0x3b
                 get_mds_ds_fh+0x46
                 ds_checkstate+0x3f
                 nfs_ds_cp_dispatch+0x9e
                 svc_getreq+0x20d
                 svc_run+0x197
                 svc_do_run+0x81
                 nfssys+0xa0e

> ffffff01f9f569b0$<bufctl_audit
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
ffffff01f9f569b0 ffffff01fa04fc68     2a7a9936fecc ffffff01d84574e0
                 ffffff01c68262e0 ffffff01c70614c0                0
                 kmem_cache_alloc_debug+0x283
                 kmem_cache_alloc+0x164
                 kmem_alloc+0xa3
                 xdr_bytes+0x70
                 xdr_mds_sid+0x21
                 xdr_ds_fh_v1+0x68
                 xdr_ds_fh+0x3f
                 xdr_ds_fh_fmt+0x3b
                 get_mds_ds_fh+0x46
                 ds_checkstate+0x3f
                 nfs_ds_cp_dispatch+0x9e
                 svc_getreq+0x20d
                 svc_run+0x197
                 svc_do_run+0x81
                 nfssys+0xa0e

BTW: I had enabled detailed accounting earlier with setting in /etc/system:

set kmem_flags=0xf

I think what is going on is that I assumed the space in the ds_filehandle was fixed and that there was no allocation going on for the mds_sid when it was xdr decoded. There are several complications to this:

  1. Could have been leaking before - i.e., I didn't establish a baseline.
  2. Why is it only on the ds_checkstate?
    1. Who says it is?
    2. Client treats FH as opaque, wouldn't decode it and especially not as OpenSolaris specific code.
    3. Need to check the DS for memory leaks!

Okay, from code inspection, the DS would have the same issue. I created a free routine and went through and applied it everywhere. Now when we add a new field with a memory allocation, we can make a change in only 1 place to clean it up.

Also, the leak was happening before - we just never bothered to check for it or track it down.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
Hung system, need core

I've got a hung v20z and I want to get a core. Luckily I had it configured to drop down to kmdb and I had a chatline available to people in my group. :-<

The trick is get the LOM to work with you. In this case, CTRL-e c l 0 (that is an 'L' and not a '1') got the LOM console to force the system to drop down into kmdb. And from there, it was easy to generate the core:

Jul 29 12:35:41 pnfs-17-22 last message repeated 1 time
[halt aborted]
[ignored]
[halt sent]

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ scsi_vhci crypto mac cpc uppc neti sd ptm ufs unix 
cpu_ms.AuthenticAMD.15 sv mpt zfs krtld s1394 sppp rdc nca uhci ii hook lofs 
genunix idm ip nsctl logindmux nsmb sdbc usba specfs pcplusmp nfs md random 
cpu.generic sctp arp stmf sockfs smbsrv ]
[1]> $

Now I need to figure out why the system was wedged!

And the core was corrupt!


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

20090729 Wednesday July 29, 2009
Almost finished with the kspe DTrace script for tracing

Well, I ran into problems in dumping the contents of the mds_sid array, DTrace does not iterate. To fix this, I decided to use a thread counter in the code for mds_ds_path_to_mds_sid:entry. I then increment it in the mds_ds_path_to_mds_sid:return. This allows me to enumerate the data set names.

#!/usr/sbin/dtrace -s

nfsv4:::spe-i-check_open
{
 	printf("%d (%d) from %s is checking %s",
            (uid_t)arg0, (gid_t)arg1, stringof(arg3), stringof(arg2));
}

nfsv4:::spe-i-policy_eval
{
 	printf("Policy %d %s with error %d from %s",
            (uint_t)arg0, (boolean_t)arg1 ? "matched" : "did not match",
            (int)arg2, stringof(arg3));
}

::nfs41_spe_allocate:entry
{
 	self->addr = (struct netbuf *)arg2;
	self->stripe_count = (count4 *)arg4;
	self->unit_size = (uint32_t *)arg5;
	self->mds_sids = (mds_sid **)arg6;

	self->loaded_sids = 0;
}

::nfs41_spe_allocate:return
/args[1] == 0/
{
 	printf("Policy has %d stripes and %u block size",
            *self->stripe_count, *self->unit_size);
}

::nfs41_spe_allocate:return
/args[1] != 0/
{
 	printf("No matching policy");
}

::mds_ds_path_to_mds_sid:entry
{
 	self->ustring = (utf8string *)arg0;
	self->ss_name = stringof(self->ustring->utf8string_val);
	self->mds_sid = (struct mds_sid *)arg1;
}

::mds_ds_path_to_mds_sid:return
/args[1] == 0/
{
 	ss_name = (char *)alloca(self->ustring->utf8string_len + 1);
	 bcopy(self->ustring->utf8string_val, ss_name,
            self->ustring->utf8string_len);
    	 ss_name[self->ustring->utf8string_len + 1] = '\0';
	printf("mds_sid[%d] = %s", self->loaded_sids++,
            stringof(ss_name));
}

::mds_ds_path_to_mds_sid:return
/args[1] != 0/
{
}

The remaining problem was that my first attempt to dump the utf8string resulted in garbage at the end of the output. Which makes sense, as such strings are not NUL terminated. Here is the original code:

::mds_ds_path_to_mds_sid:entry
{
 	u = (utf8string *)arg0;
  	self->ss_name = stringof(u->utf8string_val);
	self->mds_sid = (struct mds_sid *)arg1;
}

::mds_ds_path_to_mds_sid:return
/args[1] == 0/
{
 	printf("mds_sid[%d] = %s", self->loaded_sids++, self->ss_name);
}

There was an interesting thread of discussion over at: [dtrace-discuss] getting extra characters with printf(copyin(a, b)). I tried the approaches suggested by both Chad and Adam, but no luck. I get on getting:

dtrace: error on enabled probe ID 7 (ID 63585: fbt:nfssrv:mds_ds_path_to_mds_sid:return): invalid address (0xffffff01f6a103f8) in action #2 at DIF offset 56

Which to be fair, I more than expected. The issue is that the copyin and copyinstr work on user land memory and the stuff I want to dump is in the kernel. Where that thread really helped me though was it pointed me to using bcopy(). With that in play, things went well:

  1   3502 nfs41_spe_allocate:spe-i-check_open 0 (0) from 10.1.233.12 is checking /pnfs1/nfs41/pnfs-4-02/2/spek.txt
  1   3501 nfs41_spe_allocate:spe-i-policy_eval Policy 1000 did not match with error 0 from 10.1.233.12
  1   3501 nfs41_spe_allocate:spe-i-policy_eval Policy 2000 matched with error 0 from 10.1.233.12
  1  63585    mds_ds_path_to_mds_sid:return mds_sid[0] = pnfs-17-22:pnfs1/ds1n
  1  63585    mds_ds_path_to_mds_sid:return mds_sid[1] = pnfs-17-22:pnfs1/ds8n
  1  63585    mds_ds_path_to_mds_sid:return mds_sid[2] = pnfs-17-22:pnfs2/ds2n
  1  63585    mds_ds_path_to_mds_sid:return mds_sid[3] = pnfs-17-23:pnfs1/ds3n
  1  63585    mds_ds_path_to_mds_sid:return mds_sid[4] = pnfs-17-23:pnfs1/ds7n
  1  63585    mds_ds_path_to_mds_sid:return mds_sid[5] = pnfs-17-23:pnfs2/ds4f
  1  63585    mds_ds_path_to_mds_sid:return mds_sid[6] = pnfs-4-01:pnfs1/d10n
  1  63585    mds_ds_path_to_mds_sid:return mds_sid[7] = pnfs-4-01:pnfs1/ds4
  1  62825        nfs41_spe_allocate:return Policy has 8 stripes and 1000 block size

Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
Adding some DTrace probes to the kspe and finding a very nasty bug

It is time to get rid of the remaining debug printfs in the kspe code. The answer is DTrace.

I've added the following:

        /*
         * Give the client, uid, gid, and path.
         */
        DTRACE_NFSV4_4(spe__i__check_open, uid_t, pat.uid,
            gid_t, pat.gid, char *, dir_path, char *, address);

Which will allow me to determine the relevant parameters to doing a check in the policies. And:

                /*
                 * Give the policy id, match, result, and client.
                 */
                DTRACE_NFSV4_4(spe__i__policy_eval, uint_t, sp->sp_id,
                    boolean_t, b, int, result, char *, address);

Will allow me to see the result of each policy lookup. I could probably do this with a function return, I would just have to pass in the address. But this is just as much an exercise in learning more about DTrace as anything else.

The last piece I will do as a function return. But I will lose the client address!

[root@pnfs-17-24 ~]> more spe.d 
#!/usr/sbin/dtrace -s

nfsv4:::spe-i-check_open
{
        co_uid = (uid_t)arg0;
        co_gid = (gid_t)arg1;
        co_path = (char *)arg2;
        co_addr = (char *)arg3;

        printf("%d (%d) from %s is checking %s",
            co_uid, co_gid, stringof(co_path), stringof(co_addr));
}

nfsv4:::spe-i-policy_eval
{
        pe_id = (uint_t)arg0;
        pe_match = (boolean_t)arg1;
        pe_res = (int)arg2;
        pe_addr = (char *)arg3;

        printf("Policy %d %s with error %d from %s",
            pe_id, pe_match ? "matched" : "did not match",
            pe_res, stringof(pe_addr));
}

Yields:

[root@pnfs-17-24 ~]> ./spe.d
dtrace: script './spe.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0   3502 nfs41_spe_allocate:spe-i-check_open 0 (0) from /pnfs2/pnfs/pnfs-4-02/spek4.txt is checking 0.0.0.0
  0   3501 nfs41_spe_allocate:spe-i-policy_eval Policy 1000 did not match with error 0 from 0.0.0.0

Okay, my address and path is backwards on the first one. And the addresses look wrong. The uid and gid are actually probably okay, I'm doing my tests as root after all.

And a quick sanity check shows I am right:

[root@pnfs-17-24 ~]> ./spe.d 
dtrace: script './spe.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  1   3502 nfs41_spe_allocate:spe-i-check_open 3533 (10) from 0.0.0.0 is checking /pnfs1/nfs41/ds1tmds3.snoop
  1   3501 nfs41_spe_allocate:spe-i-policy_eval Policy 1000 did not match with error 0 from 0.0.0.0
  1   3501 nfs41_spe_allocate:spe-i-policy_eval Policy 2000 matched with error 0 from 0.0.0.0

Turns out that the passed in address is bogus:

[root@pnfs-17-24 ~]> Jul 28 21:12:24 pnfs-17-24 nfssrv: Address is 4294901760 for FFFF0000

This was working at one point, maybe. :->

I ended up doing some further investigation and added another probe:

nfsv4:::mds-i-create_address
{
        ca_req = (struct svc_req *)arg0;
        ca_addr = (struct netbuf *)arg1;

        a = (char *)ca_req->rq_xprt->xp_xpc.xpc_rtaddr.buf;
        iaddr = 0;
        iaddr = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3];
        r1 = (iaddr & (0xff << 24)) >> 24;
        r2 = (iaddr & (0xff << 16)) >> 16;
        r3 = (iaddr & (0xff << 8)) >> 8;
        r4 = iaddr & 0xff;

        printf("Address is %u.%u.%u.%u", r1, r2, r3, r4);
}

It is late and DTrace does not do iteration. The code mimics what I do in C. It gives a consistent answer:

  0   3564 mds_createfile_get_layout:mds-i-create_address Address is 255.255.255.254
  0   3478 nfs41_spe_allocate:spe-i-check_open 0 (0) from 255.255.255.254 is checking /pnfs1/nfs41/ddsssss

Hmm, it is not changing as I change the client I'm making the request from.

I don't think it is working in OpenSolaris...

I think the bug isn't in the code, but in the way I am unpacking the netbuf. I'm looking at 4 bytes and the buffer is typically 16 bytes.

Yes, I ended up looking at our (OpenSolaris) source for __inet_taddr2uaddr() to figure out what was going on. I needed to go from:

1094         /*
1095          * XXX: Assuming IPv4!
1096          */
1097         a = (char *)addr->buf;
1098         for (i = 0; i < 4; i++) {
1099                 iaddr |= a[i] << ((3-i) * 8);
1100         }
1101 
1102         pat.addr = iaddr;

To:

1111         struct sockaddr_in      *sa;
...
1115         sa = (struct sockaddr_in *)(addr->buf);
1116 
1117         /*
1118          * XXX: Assuming IPv4!
1119          */
1120         a = (unsigned char *)&sa->sin_addr;
1121         for (i = 0; i < 4; i++) {
1122                 iaddr |= a[i] << ((3-i) * 8);
1123         }
1124 
1125         pat.addr = iaddr;

And we see this output:

  0  19974 nfs41_spe_allocate:spe-i-check_open 0 (0) from 10.1.233.12 is checking /pnfs1/nfs41/jk19

Well, I still need to finish up the DTRace script to dump the found layout upon return. And I should figure out how to dump the client address via DTrace.

The important lesson here is to test the stuff you think you've already tested. I know I tested the address logic in the stand alone prototype. And I thought I also tested it in a special kernel version, but I must not have done so. In fact, it makes sense with some of the debug logic I pulled out earlier this week.

In all, I'm happy, I fixed a very nasty bug.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
DTrace probe not firing

I'm trying to track down whether the client address is ever being set in a NFS request. I've checked with build 117, 112, 109, 85, and now I'm trying 79a. I've got a VMWare image running on a laptop. But for some reason my probe isn't loading:

# ./req.d
dtrace: failed to compile script ./req.d: line 3: probe description ::rfs_dispatch:entry does not match any probes
# dtrace -f rfs_dispatch
dtrace: invalid probe specifier rfs_dispatch: probe description ::rfs_dispatch: does not match any probes

A clue can be found here:

# share
#

The clue is that with no shares loaded, then the 'nfssrv' module is not loaded. If we create a share, we see:

# share -F nfs -o rw,anon=0 /export/home
# dtrace -f rfs_dispatch
dtrace: description 'rfs_dispatch' matched 2 probes
^C

# ./req.d
dtrace: script './req.d' matched 1 probe

We have success!


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

20090728 Tuesday July 28, 2009
Showing relationships between files, layouts, and mpds

I've got two clients doing some simple file copies, file removes, and directory removes. After a while, I stop both test scripts and unmount the mds. We can see the relationship between files, layouts, and device tables (mpd):

[root@pnfs-17-24 ~]> ./rlays.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_Layout_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x50
    refcnt = 0x99
    refcnt = 0x49
[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt | wc -l
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
     303
[root@pnfs-17-24 ~]> ./rmpd.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_mpd_entry_cache|::print  struct rfs4_dbe  
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2

I just fixed a bug where a layout request invalidated the old layout. So we can see here that there are over 300 files with state which are still active, there are 3 corresponding layouts, each with a different usage, and 3 mpds, each with a hold from their respective layout. The 3 layouts correspond to the 3 policies in effect in the spe. There is no round robin scheduling going on, which might see more layouts in use.

I haven't fixed it yet, and this example doesn't show it, but each layout create is going to cause a corresponding mpd create. Which is okay for now, the only difference we could support would be a different stripe unit size.

And we can see that the files have been harvested, but the layouts have not:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt | wc -l
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
       0
[root@pnfs-17-24 ~]> ./rlays.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_Layout_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
[root@pnfs-17-24 ~]> ./rmpd.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_mpd_entry_cache|::print  struct rfs4_dbe  
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2

The layouts are ripe for plucking, with a refcnt of 1, it means that they are in the table, but no one else references them. Let's see if we can quickly use them again:

[root@pnfs-17-24 ~]> ./rlays.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_Layout_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x5
    refcnt = 0xb
    refcnt = 0x4
[root@pnfs-17-24 ~]> ./rmpd.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_mpd_entry_cache|::print  struct rfs4_dbe  
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2

Now I don't have proof here that these are the same entries in the db, but what I wanted to point out is that I never invalidate entries in either the layout or mpd tables. Instead, I rely on the fact that they can not go away until all external references are gone. Based on timing, I'd say we did reuse the entries, otherwise I would expect to see 6 mpds and not 3 of them. I.e., they do not get reused and if there were new layout entries, we'd expect to see 6 mpds.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
Tracking down a nasty refcnt bug with rfs4_file_t, or am I?

I'm interested in that layout and mpd entries get harvested as they become stale. Because of the prior assumption that there was only ever 1 of each, it never was an issue.

If we recall our mappings, each mpd can belong to several layouts, each layout can belong to several files, etc. So when I discovered that mpds were sticking in memory with a refcnt of 2, I checked the layouts, saw that they had a refcnt of 2, and then checked the files:

[root@pnfs-17-24 ~]> more rlays.sh 
#!/bin/sh -x
echo '::walk mds_Layout_entry_cache|::print  struct rfs4_dbe' | mdb -k
[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2
    refcnt = 0x2

Now, some of these would go away, but in the end, several would stick in memory. I actually found 2 nasty rfs4_file_t rele bugs and fixed them, but the issue stayed.

I thought my last fix did the trick, I turned my attention away for a bit as I worked on my MacBook Pro, and the file references went away. I rebooted and tried again. No such luck, that was when I got the above.

I unmounted the mds from my two clients and went away to study the code, trying to find out how an mds_op_close() did a file rele which would lead to the layout rele.

I was going to reboot, without any luck, and try the case where I created the files and did nothing else - the remove code is suspect to me still. Well, before I did that, I checked the file refcnts:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k+  
echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1
    refcnt = 0x1

My guess is that somehow the clients are causing state to be clung to on the server. It is only on the unmount that a final rele takes place. It has been some time since I did that last check:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k + echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 

[root@pnfs-17-24 ~]> ./rlays.sh 
+ mdb -k 
+ echo ::walk mds_Layout_entry_cache|::print  struct rfs4_dbe 
[root@pnfs-17-24 ~]> ./rmpd.sh 
+ mdb -k 
+ echo ::walk mds_mpd_entry_cache|::print  struct rfs4_dbe  
{
    lock = [
        {
            _opaque = [ 0 ]
        },
    ]
    refcnt = 0x1

Whoops, I expected the mpds to be out of memory as well and after a bit of patience, they are. :->

[root@pnfs-17-24 ~]> ./rmpd.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_mpd_entry_cache|::print  struct rfs4_dbe  

A quick test of creating 4 files sees:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x5
    refcnt = 0x5
    refcnt = 0x5
    refcnt = 0x5

If we then unmount, the snoop is mainly the following:

[root@pnfs-4-02 ~]> grep RETURN\) xxx
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 8 (DELEGRETURN)
NFS:  Op = 51 (LAYOUTRETURN)
NFS:  Op = 51 (LAYOUTRETURN)

And indeed, we see the refcnts get decremented by two for each file:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 
    refcnt = 0x3
    refcnt = 0x3
    refcnt = 0x3
    refcnt = 0x3

And after a while:

[root@pnfs-17-24 ~]> ./rfps.sh | grep refcnt
+ mdb -k 
+ echo ::walk mds_File_entry_cache|::print  struct rfs4_dbe 

I'm actually happy that this strange behavior occurred. I wouldn't have taken the time to painstakingly poured over every file rele and I wouldn't have thus found the two bugs. The first was in error handling in the remove case and the second was in file delegation recalls. The first was probably hit or miss to being found. The second was a timebomb.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

20090727 Monday July 27, 2009
How spe interacts with file opens

The spe is a simple policy engine, which dictates the stripe size, the block size, and the set of devices used during file creation - these three items form the layout. The impact of that statement is subtle, but it sets some expectations. The spe code only fires on a file creation, so if a file already exists, then we read in the on disk layout (odl) to get the layout.

So, if files were created under a different set of policies, then there will be no problem accessing the files under the new set of policies.

Here is a file we created before we even enabled spe:

[root@pnfs-17-21 ~]> cp nfs4_vnops.c  /pnfs/pnfs-17-24/redo.c
[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/redo.c
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 13
Layout [0]:
        Layout obtained at: Sat Jul 25 17:51:55:18886 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 4, stripe unit: 32768
        Stripe [0]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [1]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [2]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [3]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK

I've rebooted the client and mds many times, I've added spe policies, but I can check to see that we get the exact same layout. (Note, I've also changed the mount point to be the root of the mds, so expect the path to be different!)

[root@pnfs-4-02 ~]> touch /pnfs/pnfs-17-24/pnfs2/pnfs/redo.c
[root@pnfs-4-02 ~]> nfsstat -l /pnfs/pnfs-17-24/pnfs2/pnfs/redo.c
Layout unacquired
[root@pnfs-4-02 ~]> cat /pnfs/pnfs-17-24/pnfs2/pnfs/redo.c > /dev/null
[root@pnfs-4-02 ~]> nfsstat -l /pnfs/pnfs-17-24/pnfs2/pnfs/redo.c
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 1
Layout [0]:
        Layout obtained at: Mon Jul 27 11:38:40:673328 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 4, stripe unit: 32768
        Stripe [0]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [1]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [2]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [3]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK

One thing to note right off the bat is that 'touch' does not cause the file layout to be loaded. That is simple, the 'touch' command is a meta-data operation and can be serviced on the mds. It does not need to open the file. I'd have to look at the snoop output to be sure, but we look like we are getting the same layout.

Now, if I do the same copy as before, we see:

[root@pnfs-4-02 ~]> nfsstat -l /pnfs/pnfs-17-24/pnfs2/pnfs/redome.c
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 116
Layout [0]:
        Layout obtained at: Mon Jul 27 11:43:30:691639 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 4, stripe unit: 3968
        Stripe [0]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [1]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [2]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [3]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK

We just happen to get a stripe count of 4, but notice that the stripe unit is much smaller. We can tell that the extension of '.c' caused the first policy to fire:

[root@pnfs-17-24 ~]> more /etc/policies.spe 
1000, 4, 4k, pool17, ext == c
2000, 8, 1k, pool17:pool4, path == /pnfs1
3000, 3, 8k, pool4:pool17, path == /pnfs2

Okay, what if we call it redome.h? That should cause the last policy to match. Note that the test of the path attribute is based on the path of the mds and not the client. Also, if we have a mount point under the root of the mds, we still get the complete path when we do the check.

[root@pnfs-4-02 ~]> cp nfs4_vnops.c /pnfs/pnfs-17-24/pnfs2/pnfs/redome.h
[root@pnfs-4-02 ~]> nfsstat -l /pnfs/pnfs-17-24/pnfs2/pnfs/redome.h
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 64
Layout [0]:
        Layout obtained at: Mon Jul 27 11:46:20:549943 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 3, stripe unit: 8000
        Stripe [0]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [1]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [2]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK

And it does! We can see it in both the stripe count and stripe unit.

And these files will have those same layouts as long as they live.

Well, I need to go figure out some more unit tests to break my code...


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
More on getting the kspe code to generate an array of mds_sid

Okay, I've forced the ds_dataset_name to be present at creation. And I'm getting pretty hashes. I'm still not generating a mds_sid array.

When we store 'pnfs-4-01:pnfs1/d10':

Jul 27 10:59:45 pnfs-17-24 nfssrv: utf8_hash of |pnfs-4-01:pnfs1/d10|[20] is 55811466

And when we try to search for it:

Jul 27 11:04:51 pnfs-17-24 nfssrv: utf8_hash of |pnfs-4-01:pnfs1/ds10|[21] is 111623246

The lengths are different, which will cause a bit shift. And we see that one is 'd10' and the other is 'ds10'. Well, I'll change '/etc/npools.spe' as I don't have renaming a dataset name <-> mds_sid working yet.

[root@pnfs-4-01 ~]> zfs rename pnfs1/d10 pnfs1/ds10
cannot rename 'pnfs1/d10': dataset is busy

And there could be other problems.

But anyway, fix the dataset name, and we have spe, I repeat, we have spe!

Here we can see a file creation without any policy firing:

[root@pnfs-4-02 ~]> mount -o vers=4 pnfs-17-24:/ /pnfs/pnfs-17-24
[root@pnfs-4-02 ~]> cp 1234.32k.raw /pnfs/pnfs-17-24/pnfs1/nfs41/spek4.txt
[root@pnfs-4-02 ~]> nfsstat -l !$
nfsstat -l /pnfs/pnfs-17-24/pnfs1/nfs41/spek4.txt
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 4
Layout [0]:
        Layout obtained at: Mon Jul 27 11:04:51:725322 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 10, stripe unit: 32768
        Stripe [0]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [1]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [2]:
                tcp:pnfs-4-01.Central.Sun.COM:10.1.233.11:47009 OK
        Stripe [3]:
                tcp:pnfs-4-01.Central.Sun.COM:10.1.233.11:47009 OK
        Stripe [4]:
                tcp:pnfs-4-01.Central.Sun.COM:10.1.233.11:47009 OK
        Stripe [5]:
                tcp:pnfs-4-01.Central.Sun.COM:10.1.233.11:47009 OK
        Stripe [6]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [7]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [8]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [9]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK

And here we can see the same file copy resulting in not only a different number of stripes, but to a different dataset order!

[root@pnfs-4-02 ~]> umount /pnfs/pnfs-17-24
[root@pnfs-4-02 ~]> mount -o vers=4 pnfs-17-24:/ /pnfs/pnfs-17-24
[root@pnfs-4-02 ~]> cp 1234.32k.raw /pnfs/pnfs-17-24/pnfs1/nfs41/spek5.txt
[root@pnfs-4-02 ~]> nfsstat -l /pnfs/pnfs-17-24/pnfs1/nfs41/spek5.txt
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 140
Layout [0]:
        Layout obtained at: Mon Jul 27 11:27:06:306959 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 8, stripe unit: 960
        Stripe [0]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [1]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [2]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [3]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [4]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [5]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [6]:
                tcp:pnfs-4-01.Central.Sun.COM:10.1.233.11:47009 OK
        Stripe [7]:
                tcp:pnfs-4-01.Central.Sun.COM:10.1.233.11:47009 OK

Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
First real test of kspe!

So, I started off a test run with the kspe. It allows me to decide which datasets are going to be used in a layout. (see my slides from Oklahoma City OpenSolaris User Group (OKCOSUG) presentation). I set up a simple set of npools and policies:

[root@pnfs-17-24 /etc]> more npools.spe 
pool17 pnfs-17-22:pnfs1/ds1 pnfs-17-23:pnfs1/ds3 pnfs-17-22:pnfs2/ds2 pnfs-17-23
:pnfs2/ds4 pnfs-17-23:pnfs1/ds7 pnfs-17-22:pnfs1/ds8
pool4 pnfs-4-01:pnfs1/ds4 pnfs-4-01:pnfs2/ds5 pnfs-4-01:pnfs2/d9 pnfs-4-01:pnfs1
/ds10
[root@pnfs-17-24 /etc]> more policies.spe 
1000, 4, 4k, pool17, ext == c
2000, 8, 1k, pool17:pool4, path == /pnfs1
3000, 3, 8k, pool4:pool17, path == /pnfs2

Note that a dataset identifier is a combination of host name and the zfs filesystem. I wrote the code and I still struggle with this is not a path name, it is a zfs name. I.e., it is not ':/' as a connector. 'pnfs2/ds2' is the zfs name and not '/pnfs2/ds2'!

And struggled a bit to load them, until I found my kspe implementation notes.

And right off the bat, I see my policies are not matching and I've got a core!

[root@pnfs-4-02 ~]> cp 1234.32k.raw /pnfs/pnfs-17-24/pnfs1/nfs41/spe_1234.32k.raw.txt

yields

[root@pnfs-17-24 ~]> rc = 0, eval = 0, id = 1000
rc = 0, eval = 1, id = 2000
WARNING: spe: 1000 8 

panic[cpu1]/thread=ffffff01d74bf4e0: BAD TRAP: type=e (#pf Page fault) rp=ffffff0008162d70 addr=0 occurred in module "nfssrv" due to a NULL pointer dereference
...
[1]> $c
kmdb_enter+0xb()
...
nfssrv`mds_layout_hash+0x19(ffffff0008162fe0)
...
[1]> ffffff0008162fe0::print layout_core_t
{
    lc_stripe_unit = 0x3e8
    lc_stripe_count = 0x8
    lc_mds_sids = 0
}

Which tells me that the kspe code did not generate a mds_sid array. Again, no problem, I just wrote that code last week and this is the first test of it!

So the bug was a stupid error:

                        return (mds_sids ? 0 : ENOENT);
versus
                        return (*mds_sids ? 0 : ENOENT);

Success was being returned with an empty array of mds_sids. There is still a bug, we should be finding matches, but at least it is not so nasty.

The bug is that these comparisons are failing:

Jul 27 01:37:45 pnfs-17-24 nfssrv: Comparing policy npool |pool17| at 7 to global |pool17| at 7
Jul 27 01:37:45 pnfs-17-24 nfssrv: Comparing policy npool |pool17| at 7 to global |pool4| at 6
Jul 27 01:37:45 pnfs-17-24 nfssrv: Comparing policy npool |pool4| at 6 to global |pool17| at 7
Jul 27 01:37:45 pnfs-17-24 nfssrv: Comparing policy npool |pool4| at 6 to global |pool4| at 6
Jul 27 01:37:45 pnfs-17-24 nfssrv: spe_map_npools_to_mds_sids: No matching npools!

The first and last should match. Ah, they do (as shown by additional debug logic), which shows me barking up a wrong tree!

D'oh! I found it! If we look at this code:

spe_map_npools_to_mds_sids(kspe_state_t *kspe, spe_policy *sp,
...
        spe_npool       *sn;
        spe_npool       *np;

        /*
         * For each npool in the policy, find it in the
         * list of npools, and start assigning datasets.
         */
        for (sn = sp->sp_npools; sn != NULL; sn = sn->next) {
                for (np = kspe->ks_npools; np; np = np->next) {
                        cmp = utf8_compare(&np->sn_name, &sn->sn_name);
                        if (cmp == 0) {
                                /*
                                 * Now we fill in entries in the *mds_sids
                                 * array.
                                 */
                                for (ss = sn->sn_dses; ss; ss = ss->next) {

We see I was lazy in assuming that sn and np were the same thing. Note that they are the same type of object, but the sn points to the npools in the policy and np points to the npools in the global list. The point is that a npool can be in multiple policies. So rather than store the list of datasets in the policies (which is a nightmare for updating) or pointers to npools in the policies (which sounds good and I can't remember why not!) we store the datasets in the global list.

So instead of searching a list of datasets in the global list, we search an empty list in the policy. :->

And I'm at the next bug:

Jul 27 02:11:04 pnfs-17-24 nfssrv: mds_ds_path_to_mds_sid returned an error!

We're not finding a match as we search the ds_guid_info database. When I had a search issue a couple nights ago, it turned out to be the hash function. If we look at what we have:

        instp->ds_guid_info_dataset_name_idx =
            rfs4_index_create(instp->ds_guid_info_tab,
            "DS_guid-dataset-name-idx", mds_str_hash,
            ds_guid_info_dataset_name_compare, ds_guid_info_dataset_name_mkkey,
            FALSE);

We are using mds_str_hash on an utf8_string - which is a length and a string. I think that will be problematic. I'm not too happy with the hash functions here in general, but for now all I care about is that they are consistent.

They aren't consistent in this case and it is getting late...

As it boots:

Jul 27 03:33:41 pnfs-17-24 nfssrv: utf8_hash of empty str

As we search:

Jul 27 03:34:35 pnfs-17-24 nfssrv: utf8_hash of |pnfs-17-22:pnfs1/ds1|[21] is 111612431

We ought to see a non-NULL addition and we ought to see 10 messages, not 1.

Bzzt! That is an expectation and not code. The debug statements are correct. If we look at the entry create routine, which is called just before the hash function, we see:

static bool_t
ds_guid_info_create(rfs4_entry_t u_entry, void *arg)
{
...
        pgi->ds_dataset_name.utf8string_val = NULL;
        pgi->ds_dataset_name.utf8string_len = 0;

We don't have our hands on that info -- or do we?

We do - I can fix this - but not now. It will wait until tomorrow, because I've got to sleep...


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

20090726 Sunday July 26, 2009
File contents not matching

So after a reboot, a diff of a file copied to the server is showing problems. There are several triage steps to take:

  1. See if this happens with the old server - this will isolate whether or not it is an existing bug.
  2. See if it happens across an unmount - this will isolate whether the reboot is doing something.
  3. See if it happens from another client - this will also isolate whether it is the reboot or not.
  4. Try it such that one write goes to each DS and not to each datatset - which will tell us if the writes are wrong, i.e., the FH decode is not working correctly.

I can show the problem thusly:

[root@pnfs-17-21 ~]> cp 1234.32k.raw /pnfs/pnfs-17-24/dc_1234.32k.raw
[root@pnfs-17-21 ~]> diff 1234.32k.raw /pnfs/pnfs-17-24/dc_1234.32k.raw | wc -l
       0

How do I know that the client is even getting the bits back from the server. If they are still in local buffers, this may not be happening. So, lets try from another client:

[root@pnfs-4-02 ~]> diff 1234.32k.raw /pnfs/pnfs-17-24/dc_1234.32k.raw | wc -l
     132

So it isn't the reboot and it probably is not the ODL.

Okay, I just configured a stock community and repeated the same test and there was no loss of data. Clearly either my changes or changes I failed to do have caused the problem. I.e., either I caused a bug or I exposed an existing bug - one that the new feature set enabled. Either way, I have to fix it before I can integrate my changes.

I'm beginning to think it is in code which has yet to be implemented:

[thud@ultralord nfs]> grep dnk_sid *
dserv_server.c:	dskey.dnk_sid = &fhp->fh.v1.mds_sid;
[thud@ultralord nfs]> grep dnk_fid *
dserv_server.c:	CRC32(rc, key->dnk_fid->val, key->dnk_fid->len,
dserv_server.c:	NFS_AVL_COMPARE(a->dnk_fid->len, b->dnk_fid->len);
dserv_server.c:	rc = memcmp(a->dnk_fid->val, b->dnk_fid->val,
dserv_server.c:	    a->dnk_fid->len);
dserv_server.c:	data->dnd_fid = key->dnk_fid;
dserv_server.c:	dskey.dnk_fid = &fhp->fh.v1.mds_fid;
dserv_server.c:	key->dnk_fid = &key->dnk_real_fid;

I enabled the setting of 'dnk_sid', but no one uses it. I bet what is happening is that both 32k chunks of the files are being written on the same dataset...

With some debug printfs, we can see that the DSes are picking different object sets to save the data:

Jul 26 21:28:46 pnfs-17-22 nfssrv: Accessing pnfs2/ds2/EF25FBCF08FA1BD7 for D336969B13CB6C0E000000000000001E
Jul 26 21:28:46 pnfs-17-22 nfssrv: Accessing pnfs1/ds1/EF25FBCF08FA1BD7 for 86FE860F0E0B87D4000000000000001E

So I implemented the code for 'dnk_sid' and now I'm seeing:

[root@pnfs-4-02 ~]> diff 1234.32k.raw /pnfs/pnfs-17-24/slow | wc -l
       0

It also works now across a reboot.

The bad thing here is that I'm making it up on the fly, I don't understand the nnode code. I'm going to have to get a code review just on that chunk.

Actually, strike that. I know there was a bug, missing code to be exact. I know my fix works, and I understand the changes I made. I don't understand how those changes work in the bigger picture of the nnode code. Oh, I understand that when we tried to access the two different datasets on the same DS, we would sometimes get one instead of the other. I'm still going to talk with the author, I want to make sure my fixes are optimal.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
Testing On Disk Layouts

Last night, I did the following test of file creation:

[root@pnfs-17-21 ~]>  cp nfs4_vnops.c /pnfs/pnfs-17-24/Pnfs4_vnops.c
[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/Pnfs4_vnops.c
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 13
Layout [0]:
        Layout obtained at: Sat Jul 25 02:55:17:818063 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 4, stripe unit: 32768
        Stripe [0]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [1]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [2]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [3]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
[root@pnfs-17-21 ~]> ls -la  /pnfs/pnfs-17-24/Pnfs4_vnops.c
-rw-r--r--   1 root     root      409540 Jul 25 02:55 /pnfs/pnfs-17-24/Pnfs4_vnops.c
[root@pnfs-17-21 ~]> diff nfs4_vnops.c /pnfs/pnfs-17-24/Pnfs4_vnops.c

So we have pNFS!

I then went to bed, knowing that the next set of testing to do was what we call on disk layouts or odl. When we create a file, we generate a layout based on all of the currently reporting datasets (ignore kspe, which I will soon enable and start testing). When we generate the create layout, we need to stuff it somewhere such that a client which opens the file either for appending or reading can get at the contents. So we store the layout on disk, hence odl.

The system we had in place made assumptions like all devices would be there, they would report in the same order, etc. Well, with using the mds_sids as keys, I've changed that.

To test things, I've rebooted the client and the servers. I want to make sure that there is nothing in memory. Soon, I'll add a new DS to make sure the number of reporting datasets does not have an impact. But for now, what happens?

[root@pnfs-17-21 ~]> diff nfs4_vnops.c /pnfs/pnfs-17-24/Pnfs4_vnops.c2
Binary files nfs4_vnops.c and /pnfs/pnfs-17-24/Pnfs4_vnops.c2 differ
[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/Pnfs4_vnops.c2
Layout unacquired

I don't need snoop, I need to start examining the odl code. And to be honest, I was expecting problems here.

I don't think the layout was written to disk. For now, we plaop it down in '/var':

[root@pnfs-17-24 ~]> cd /var/nfs/v4_state/layouts
[root@pnfs-17-24 layouts]> ls -la
total 6
drwxr-xr-x   3 daemon   daemon       512 Jun 26 17:27 .
drwxr-xr-x   5 daemon   daemon       512 Jul 23 16:30 ..
drwxr-xr-x   2 daemon   daemon       512 Jun 26 17:27 2d90005
[root@pnfs-17-24 layouts]> cd 2d90005/
[root@pnfs-17-24 2d90005]> ls -la
total 4
drwxr-xr-x   2 daemon   daemon       512 Jun 26 17:27 .
drwxr-xr-x   3 daemon   daemon       512 Jun 26 17:27 ..

Found the problem!

#ifdef PERSIST_LO_ENABLED

And it wasn't enabled!

IF I enable it and try the experiment again:


[root@pnfs-17-21 ~]> diff nfs4_vnops.c /pnfs/pnfs-17-24/a13.c | wc -l
   21433
[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/a13.c
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 14
Layout [0]:
        Layout obtained at: Sat Jul 25 17:49:59:853319 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 4, stripe unit: 32768
        Stripe [0]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [1]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [2]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [3]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK

I should have gotten the layout from before the write. But it looks wrong. They are out of order, see:

[root@pnfs-17-21 ~]> cp nfs4_vnops.c  /pnfs/pnfs-17-24/redo.c
[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/redo.c
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 13
Layout [0]:
        Layout obtained at: Sat Jul 25 17:51:55:18886 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 4, stripe unit: 32768
        Stripe [0]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [1]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [2]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [3]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK

Reboot and we see:

[root@pnfs-17-21 ~]> nfsstat -l /pnfs/pnfs-17-24/redo.c
Number of layouts: 1
Proxy I/O count: 0
DS I/O count: 14
Layout [0]:
        Layout obtained at: Sat Jul 25 17:55:33:205567 2009
        status: UNKNOWN, iomode: LAYOUTIOMODE_RW
        offset: 0, length: EOF
        num stripes: 4, stripe unit: 32768
        Stripe [0]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [1]:
                tcp:pnfs-17-23.Central.Sun.COM:10.1.233.193:47009 OK
        Stripe [2]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK
        Stripe [3]:
                tcp:pnfs-17-22.Central.Sun.COM:10.1.233.192:47009 OK

So the good news is that odl works, the bad news is just not right.

We can actually see this in action. I wrote a simple program to write 32k of 'a', 32k of 'b', 32k of 'c', and 32k of 'd's to a file. I then copied that file over.

[root@pnfs-17-21 ~]> cp 1234.32k.raw /pnfs/pnfs-17-24/b1234.32k.raw
[root@pnfs-17-21 ~]> diff 1234.32k.raw /pnfs/pnfs-17-24/b1234.32k.raw
[root@pnfs-17-21 ~]> reboot
...
[root@pnfs-17-21 ~]> mount -o vers=4 pnfs-17-24:/pnfs2/pnfs /pnfs/pnfs-17-24
[root@pnfs-17-21 ~]> diff 1234.32k.raw /pnfs/pnfs-17-24/b1234.32k.raw
Binary files 1234.32k.raw and /pnfs/pnfs-17-24/b1234.32k.raw differ

The first diff had to have gotten the same layout as before. Now when we reboot, we see a difference. And since the pattern is so simple, we can see what might be going on:

[root@pnfs-17-21 ~]> grep a /pnfs/pnfs-17-24/b1234.32k.raw | wc -l
       0
[root@pnfs-17-21 ~]> grep b /pnfs/pnfs-17-24/b1234.32k.raw | wc -l
      31
[root@pnfs-17-21 ~]> grep c /pnfs/pnfs-17-24/b1234.32k.raw | wc -l
       0
[root@pnfs-17-21 ~]> grep d /pnfs/pnfs-17-24/b1234.32k.raw | wc -l
      64

We need to snoop the filehandles from the create and then from the later read.

cre:  fh[0] = (72) 00000002000000010000000000000000000000002FE9040D00000010312CE43624C45DB0000000000000001E00000008EF25FBCF08FA1BD70000000A2B0000000000E40400000000
odl:  fh[0] = (72) 00000002000000010000000000000000000000002FE9040D00000010312CE43624C45DB0000000000000001E00000008EF25FBCF08FA1BD70000000A2B0000000000E40400000000
cre:  fh[1] = (72) 00000002000000010000000000000000000000002FE9040D0000001044CF9A4C47798276000000000000001E00000008EF25FBCF08FA1BD70000000A2B0000000000E40400000000
odl:  fh[1] = (72) 00000002000000010000000000000000000000002FE9040D0000001044CF9A4C47798276000000000000001E00000008EF25FBCF08FA1BD70000000A2B0000000000E40400000000
cre:  fh[2] = (72) 00000002000000010000000000000000000000002FE9040D00000010D336969B13CB6C0E000000000000001E00000008EF25FBCF08FA1BD70000000A2B0000000000E40400000000
odl:  fh[2] = (72) 00000002000000010000000000000000000000002FE9040D00000010D336969B13CB6C0E000000000000001E00000008EF25FBCF08FA1BD70000000A2B0000000000E40400000000
cre:  fh[3] = (72) 00000002000000010000000000000000000000002FE9040D0000001086FE860F0E0B87D4000000000000001E00000008EF25FBCF08FA1BD70000000A2B0000000000E40400000000
odl:  fh[3] = (72) 00000002000000010000000000000000000000002FE9040D0000001086FE860F0E0B87D4000000000000001E00000008EF25FBCF08FA1BD70000000A2B0000000000E40400000000

And they match. The problem isn't in the filehandles we hand out.

I'm going to end it there for now...


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily