« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Neat blogs

Navigation

Editing

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

20081012 Sunday October 12, 2008
Restarting with mds_gather_devs

Time to pick back up on that analysis, but remembering that ds_addr is different than ds_addr_t.

mds_gather_devs

Note, we are in usr/src/uts/common/fs/nfs/nfs41_state.c...

So mds_gather_devs does the work of stuffing the layout. It gets called for every entry found in the instp->ds_addr_tab:

    968 	ds_addr_t	*dp = (ds_addr_t *)entry;
...
    974 	if (gap->dex < gap->max_devs_needed) {
    975 		gap->lo_arg.lo_devs[gap->dex] = rfs4_dbe_getid(dp->dbe);
    976 		gap->dev_ptr[gap->dex] = dp;
    977 		gap->dex++;
    978 	}

So we keep on reading ds_addr_t data structures until we have enough.

Now, how is that table populated? We are looping over these entries in the NFSv4 state tables:

   1060 	rw_enter(&instp->ds_addr_lock, RW_READER);
   1061 	rfs4_dbe_walk(instp->ds_addr_tab, mds_gather_devs, &args);
   1062 	rw_exit(&instp->ds_addr_lock);

So we need to look for instp->ds_addr_tab or instp->ds_addr_idx. And in usr/src/uts/common/fs/nfs/ds_srv.c, we find mds_ds_addr_update which does:

    616 ds_status
    617 mds_ds_addr_update(ds_owner_t *dop, struct ds_addr *dap)
    618 {
    619 	struct mds_adddev_args darg;
    620 	bool_t create = FALSE;
    621 	ds_addr_t *devp;
...
    626 	if ((devp = (ds_addr_t *)rfs4_dbsearch(mds_server->ds_addr_uaddr_idx,
    627 	    (void *)dap->addr.na_r_addr,
    628 	    &create, NULL, RFS4_DBS_VALID)) != NULL) {
    629 		MDS_SET_DS_FLAGS(devp->dev_flags, dap->validuse);
    630 		rw_exit(&mds_server->ds_addr_lock);
    631 		return (stat);
    632 	}

Note how we are calling the ds_addr_t a devp, perhaps a better structure name might be ds_dev_addr_t.

So, if we find one in mds_server->ds_addr_tab (via the mds_server->ds_addr_uaddr_idx which is a secondary index to ds_addr_idx), then we return. Else:

    636 	darg.dev_netid = kstrdup(dap->addr.na_r_netid);
    637 	darg.dev_addr  = kstrdup(dap->addr.na_r_addr);
    638 
    639 	/* make it */
    640 	devp = (ds_addr_t *)rfs4_dbcreate(mds_server->ds_addr_idx,
    641 	    (void *)&darg);
    642 
    643 	if (devp) {
    644 		devp->ds_owner = dop;
    645 		MDS_SET_DS_FLAGS(devp->dev_flags, dap->validuse);
    646 		list_insert_tail(&dop->ds_addr_list, devp);
    647 	} else
    648 		stat = DSERR_INVAL;

we grab the info out of the ds_addr and create a new entry. Note that it is devp->ds_owner which is likely to have the addressing info I am interested in.

     98 typedef struct {
     99 	rfs4_dbe_t	*dbe;
    100 	time_t		last_access;
    101 	char		*identity;
    102 	ds_id		ds_id;
    103 	ds_verifier	verifier;
    104 	uint32_t	dsi_flags;
    105 	list_t		ds_addr_list;
    106 	listhttp://opensolaris.org/os/project/nfsv41/documentation/nfsv41_server/d13_layout_devices.jpg_t		ds_guid_list;
    107 } ds_owner_t;

So we have lists of ds_addr and ds_guid. But that ds_guid_list is currently only created and never populated.

Time to digress and attack this from a different angle.

Looking at the NFSv4.1 pNFS Devices and File Layout Structures

This may no longer be accurate, but Robert Gordon, before he passed on (to another company), left us with this image (from Server Design Document):

This says quite clearly that while it may be the spe's job to generate layouts, in order to do so you need to construct a device list. Up until now, I've been working on a month's old statement that I need to "just generate the stripe width, stripe unit size, and an array of guids". Implicit in that is that someone else would do the logic, because it was trivial, to morph that into a layout.

And you know, I keep on looking for an explicit mapping to occur between the selection of the layout and the device list - it is the title of this series of blog articles. It may not be occurring because of the maturity of the code. I.e., everything up to now is predicated on there being a fixed number of DSes and fixed number of data server storage. And relationships just work in that if you only have 1 entry in a list because there is only 1 data store, then all of the other associated lists will also only have 1 entry.

There is still a lot of work to do to make this implementation a product.

Anyway, the picture spells out a lot of what is in the spec. The other way to attack this would be to look at a snoop trace during a create.

But anyway you slice it, there is no magic happening to tie a guid to a device list.

I'm going to have to expand the scope of my project.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Trackback URL: http://blogs.sun.com/tdh/entry/restarting_with_mds_gather_devs
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed