|
NFSv4 Client's Recovery Messages
NFSv4 Client's Recovery Messages for Post-Mortem Analysis
When something goes awry with NFS, one of the typical first things we ask
for is a snoop trace. This works ok when the problem is reproducible
and you're talking to other developers - but when a non-reproducible failure
occurs on a production system, snoop traces just don't cut it.
This is especially true when it comes to post-mortem analysis (say of
a panic or hard hang). It would be quite useful to know exactly what was
going on in the system that lead to the crash.
NFSv4
is a stateful protocol (much like NFSv3's side locking protocol NLM). So
when a server fails (such as reboot or lease timeout), recovery of the
accumulated state needs to happen. We decided we would like to keep track of
when recovery wasn't 100% successful and how we got to that point to aid in
post-mortem analysis.
I did this by adding an in-kernel queue to store "facts" and "events". Events
are direct actions the nfs clients takes; whereas facts are
interesting tidbits that help determine what actions are to be taken, but
don't directly cause recovery actions -- facts are intended to be supplemental.
Having this information proved quite useful when i fixed
6225426. I knew the code and had an inclination that it might be related to
the client receiving NFS4ERR_BAD_SEQID. Luckily, we had already added the
"event" RE_BAD_SEQID, and all i had to do was use the mdb dcmd
nfs4_diag to tell me if the client did receive that error -- i
didn't have to ask for a snoop trace + reproduction. From the core file, i
just type:
> ::nfs4_diag -s
*********************************************
vfs: 3052ca3c600 mi: 306d9183000
mount point: /tools/somemntpt
mount from: server:/export/tools
Messages queued:
=============================================
2005 Feb 2 01:51:38: fact RF_ERR
...
2005 Feb 2 01:52:42: event RE_BAD_SEQID
2005 Feb 2 01:52:42: event RE_END
2005 Feb 2 01:52:42: fact RF_DELMAP_CB_ERR
>
So let's say you wanted to track when a filesytem migrated in NFS... how would
you add that ability? (note: the nfsv4 working group is currently discussing a
protocol to do replication and migration, so Solaris doesn't officially support
migration directly within NFS yet).
We can use OpenSolaris to peer
into the code...
1. Adding a new fact name
You'll find the facts defined in nfs4_clnt.h.
We can go ahead and add a fact named RF_MIGRATE:
typedef enum {
RF_BADOWNER,
RF_ERR,
RF_RENEW_EXPIRED,
RF_SRV_NOT_RESPOND,
RF_SRV_OK,
RF_SRVS_NOT_RESPOND,
RF_SRVS_OK,
RF_DELMAP_CB_ERR,
RF_MIGRATE
} nfs4_fact_type_t;
2. Add a callout using the new fact
Then wherever you need to record this fact, call nfs4_queue_fact()
and pass in your newly defined RF_MIGRATE. This function can be
found in nfs4_client_debug.c.
So a probable place to record this fact is within errs_to_action() in
nfs4_recovery.c.
void
errs_to_action(recov_info_t *recovp,
nfs4_server_t *sp, mntinfo4_t *mi, stateid4 *sidp,
nfs4_lost_rqst_t *lost_rqstp, int unmounted, nfs_opnum4 op,
nfs4_bseqid_entry_t *bsep)
...
} else {
recovp->rc_error = geterrno4(stat);
switch (stat) {
...
case NFS4ERR_STALE:
action = NR_STALE;
break;
case NFS4ERR_MOVED:
action = NR_MIGRATE;
nfs4_queue_fact(RF_MIGRATE, mi, 0, 0, 0, FALSE, NULL
0, NULL);
break;
default:
....
}
}
...
}
3. Plugging into the kernel
Within
set_fact(), you can set the nfs4_rfact_t
(a structure to hold the fields of a "fact").
In our simple example, all we wish to store is when the migration actually
happened. That information is actually already stored in
nfs4_debug_msg_t (along with which server and mountpoint recovery was
active on - see nfs4_clnt.h
). So we just need to store the fact type and don't have to store any
additional information within the nfs4_rfact_t itself.
typedef enum {
RM_EVENT,
RM_FACT
} nfs4_msg_type_t;
typedef struct nfs4_debug_msg {
timespec_t msg_time;
nfs4_msg_type_t msg_type;
char *msg_srv;
char *msg_mntpt;
union {
nfs4_rfact_t msg_fact;
nfs4_revent_t msg_event;
} rmsg_u;
nfs4_msg_status_t msg_status;
list_node_t msg_node;
} nfs4_debug_msg_t;
The nfs4_msg_type_t determines whether the rmsg_u is for
a fact or event.
static void
set_fact(nfs4_fact_type_t id, nfs4_rfact_t *fp, nfsstat4 stat4,
nfs4_recov_t raction, nfs_opnum4 op, bool_t reboot, int error,
vnode_t *vp)
{
rnode4_t *rp1;
switch (id) {
...
case RF_DELMAP_CB_ERR:
fp->rf_op = op;
fp->rf_stat4 = stat4;
rp1 = VTOR4(vp);
fp->rf_rp1 = rp1;
if (rp1 && rp1->r_svnode.sv_name)
fp->rf_char1 = fn_path(rp1->r_svnode.sv_name);
else
fp->rf_char1 = NULL;
break;
case RF_MIGRATE:
break;
default:
zcmn_err(getzoneid(), CE_NOTE, "illegal fact %d", id);
break;
}
}
Then we wish to know if this fact was generated by a successful reply
from the server and mark that in
successful_comm() (in our case,
migration will happen only after receiving NFS4ERR_MOVED, which means we got
a legit reply). Note: the reason on why we need to know if it was a successful
reply is outside the scope of our example.
static int
successful_comm(nfs4_debug_msg_t *msgp)
{
if (msgp->msg_type == RM_EVENT) {
...
} else {
switch (msgp->rmsg_u.msg_fact.rf_type) {
...
case RF_DELMAP_CB_ERR:
case RF_MIGRATE:
return (1);
case RF_SRV_NOT_RESPOND:
case RF_SRVS_NOT_RESPOND:
return (0);
default:
return (0);
}
}
}
Add the new fact to
get_facts() so we can print out a summary
"fact sheet". The fact sheet is a way of condensing up to NFS4_MSG_MAX
facts into a one line summary. In our simple example, we don't actually need
to save anything to the fact sheet:
static int
get_facts(nfs4_debug_msg_t *msgp, nfs4_rfact_t *ret_fp, char **mnt_pt,
mntinfo4_t *mi)
{
...
switch (cur_fp->rf_type) {
...
case RF_DELMAP_CB_ERR:
break;
case RF_MIGRATE:
break;
default:
zcmn_err(getzoneid(), CE_NOTE,
"get facts: illegal fact %d", cur_fp->rf_type);
break;
}
...
}
Add to
queue_print_fact() to actually print this information in
/var/adm/messages. Note: only when something goes really bad (like an open
had to be closed on the user or siglost) do we actually dump the queue's
contents to /var/adm/messages - normally we just keep it in kernel memory.
static void
queue_print_fact(nfs4_debug_msg_t *msg, int dump)
{
nfs4_rfact_t *fp;
zoneid_t zoneid;
fp = &msg->rmsg_u.msg_fact;
zoneid = getzoneid();
switch (fp->rf_type) {
...
case RF_DELMAP_CB_ERR:
fp->rf_op = op;
fp->rf_stat4 = stat4;
rp1 = VTOR4(vp);
fp->rf_rp1 = rp1;
if (rp1 && rp1->r_svnode.sv_name)
fp->rf_char1 = fn_path(rp1->r_svnode.sv_name);
else
fp->rf_char1 = NULL;
break;
case RF_MIGRATE:
zcmn_err(zoneid, CE_NOTE, "![NFS4][Server: %s][Mntpt: %s]"
"file system migrated", msg->msg_srv, msg->msg_mntpt);
break;
default:
...
}
...
}
4. Plugging into mdb
The last thing to do is to add this new fact to the mdb dcmd nfs4_diag
via nfs4_diag.c. Unfortunately, the nfs mdb module is encumbered
code so i can't show the source (yet!), but i can name the functions.
Add this to fact_to_str():
case RF_MIGRATE:
return ("RF_MIGRATE");
Then just like the kernel function queue_print_fact(), we
add the human readable output to
nfs4_fact_print():
case RF_MIGRATE:
mdb_readstr(buf, sizeof (buf), (uintptr_t)fp->rf_char1);
mdb_printf("[NFS4]%Y: file system migrated\n",
msg->msg_time.tv_sec);
break;
When the nfs4_diag.c file does become unencumbered, i'll update this
blog... until then, feel free to contact me to make changes.
5. mdb output
#mdb -k
> ::nfs4_diag
*********************************************
vfs: 6000095c900 mi: 3000d2ad000
mount point: /mnt
mount from: fsh-weakfish:/export/tmp
Messages queued:
=============================================
[NFS4]2005 May 25 17:14:36: file system migrated
>
>
> ::nfs4_diag -s
*********************************************
vfs: 6000095c900 mi: 3000d2ad000
mount point: /mnt
mount from: fsh-weakfish:/export/tmp
Messages queued:
=============================================
2005 May 25 17:14:36: fact RF_MIGRATE
> $q
So now, if there was ever a problem with any server's implementation of
migration, we would have a clue on the Solaris client of why things went bad.
Technorati Tag: OpenSolaris
Technorati Tag: Solaris
Technorati Tag: nfs
Technorati Tag: mdb
(2005-11-14 17:03:02.0/2005-06-14 08:30:39.0)
Permalink
Trackback: http://blogs.sun.com/erickustarz/en_US/entry/nfsv4_client_s_recovery_messages
|