« December 2009
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20090727 Monday July 27, 2009
First real test of kspe!

So, I started off a test run with the kspe. It allows me to decide which datasets are going to be used in a layout. (see my slides from Oklahoma City OpenSolaris User Group (OKCOSUG) presentation). I set up a simple set of npools and policies:

[root@pnfs-17-24 /etc]> more npools.spe 
pool17 pnfs-17-22:pnfs1/ds1 pnfs-17-23:pnfs1/ds3 pnfs-17-22:pnfs2/ds2 pnfs-17-23
:pnfs2/ds4 pnfs-17-23:pnfs1/ds7 pnfs-17-22:pnfs1/ds8
pool4 pnfs-4-01:pnfs1/ds4 pnfs-4-01:pnfs2/ds5 pnfs-4-01:pnfs2/d9 pnfs-4-01:pnfs1
/ds10
[root@pnfs-17-24 /etc]> more policies.spe 
1000, 4, 4k, pool17, ext == c
2000, 8, 1k, pool17:pool4, path == /pnfs1
3000, 3, 8k, pool4:pool17, path == /pnfs2

Note that a dataset identifier is a combination of host name and the zfs filesystem. I wrote the code and I still struggle with this is not a path name, it is a zfs name. I.e., it is not ':/' as a connector. 'pnfs2/ds2' is the zfs name and not '/pnfs2/ds2'!

And struggled a bit to load them, until I found my kspe implementation notes.

And right off the bat, I see my policies are not matching and I've got a core!

[root@pnfs-4-02 ~]> cp 1234.32k.raw /pnfs/pnfs-17-24/pnfs1/nfs41/spe_1234.32k.raw.txt

yields

[root@pnfs-17-24 ~]> rc = 0, eval = 0, id = 1000
rc = 0, eval = 1, id = 2000
WARNING: spe: 1000 8 

panic[cpu1]/thread=ffffff01d74bf4e0: BAD TRAP: type=e (#pf Page fault) rp=ffffff0008162d70 addr=0 occurred in module "nfssrv" due to a NULL pointer dereference
...
[1]> $c
kmdb_enter+0xb()
...
nfssrv`mds_layout_hash+0x19(ffffff0008162fe0)
...
[1]> ffffff0008162fe0::print layout_core_t
{
    lc_stripe_unit = 0x3e8
    lc_stripe_count = 0x8
    lc_mds_sids = 0
}

Which tells me that the kspe code did not generate a mds_sid array. Again, no problem, I just wrote that code last week and this is the first test of it!

So the bug was a stupid error:

                        return (mds_sids ? 0 : ENOENT);
versus
                        return (*mds_sids ? 0 : ENOENT);

Success was being returned with an empty array of mds_sids. There is still a bug, we should be finding matches, but at least it is not so nasty.

The bug is that these comparisons are failing:

Jul 27 01:37:45 pnfs-17-24 nfssrv: Comparing policy npool |pool17| at 7 to global |pool17| at 7
Jul 27 01:37:45 pnfs-17-24 nfssrv: Comparing policy npool |pool17| at 7 to global |pool4| at 6
Jul 27 01:37:45 pnfs-17-24 nfssrv: Comparing policy npool |pool4| at 6 to global |pool17| at 7
Jul 27 01:37:45 pnfs-17-24 nfssrv: Comparing policy npool |pool4| at 6 to global |pool4| at 6
Jul 27 01:37:45 pnfs-17-24 nfssrv: spe_map_npools_to_mds_sids: No matching npools!

The first and last should match. Ah, they do (as shown by additional debug logic), which shows me barking up a wrong tree!

D'oh! I found it! If we look at this code:

spe_map_npools_to_mds_sids(kspe_state_t *kspe, spe_policy *sp,
...
        spe_npool       *sn;
        spe_npool       *np;

        /*
         * For each npool in the policy, find it in the
         * list of npools, and start assigning datasets.
         */
        for (sn = sp->sp_npools; sn != NULL; sn = sn->next) {
                for (np = kspe->ks_npools; np; np = np->next) {
                        cmp = utf8_compare(&np->sn_name, &sn->sn_name);
                        if (cmp == 0) {
                                /*
                                 * Now we fill in entries in the *mds_sids
                                 * array.
                                 */
                                for (ss = sn->sn_dses; ss; ss = ss->next) {

We see I was lazy in assuming that sn and np were the same thing. Note that they are the same type of object, but the sn points to the npools in the policy and np points to the npools in the global list. The point is that a npool can be in multiple policies. So rather than store the list of datasets in the policies (which is a nightmare for updating) or pointers to npools in the policies (which sounds good and I can't remember why not!) we store the datasets in the global list.

So instead of searching a list of datasets in the global list, we search an empty list in the policy. :->

And I'm at the next bug:

Jul 27 02:11:04 pnfs-17-24 nfssrv: mds_ds_path_to_mds_sid returned an error!

We're not finding a match as we search the ds_guid_info database. When I had a search issue a couple nights ago, it turned out to be the hash function. If we look at what we have:

        instp->ds_guid_info_dataset_name_idx =
            rfs4_index_create(instp->ds_guid_info_tab,
            "DS_guid-dataset-name-idx", mds_str_hash,
            ds_guid_info_dataset_name_compare, ds_guid_info_dataset_name_mkkey,
            FALSE);

We are using mds_str_hash on an utf8_string - which is a length and a string. I think that will be problematic. I'm not too happy with the hash functions here in general, but for now all I care about is that they are consistent.

They aren't consistent in this case and it is getting late...

As it boots:

Jul 27 03:33:41 pnfs-17-24 nfssrv: utf8_hash of empty str

As we search:

Jul 27 03:34:35 pnfs-17-24 nfssrv: utf8_hash of |pnfs-17-22:pnfs1/ds1|[21] is 111612431

We ought to see a non-NULL addition and we ought to see 10 messages, not 1.

Bzzt! That is an expectation and not code. The debug statements are correct. If we look at the entry create routine, which is called just before the hash function, we see:

static bool_t
ds_guid_info_create(rfs4_entry_t u_entry, void *arg)
{
...
        pgi->ds_dataset_name.utf8string_val = NULL;
        pgi->ds_dataset_name.utf8string_len = 0;

We don't have our hands on that info -- or do we?

We do - I can fix this - but not now. It will wait until tomorrow, because I've got to sleep...


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/first_real_test_of_kspe
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed