Alok Aggarwal's Weblog

All | General | Music | NFS

20050614 Tuesday June 14, 2005

 Debugging on Sparc

Debugging on Sparc
Debugging on Sparc

While debugging x86/x64 crash dumps has been fairly extensively talked
about at various places (most recently here), I haven't come across any
resources that talk about debugging sparc dumps (other than numerous
bug reports). Now that OpenSolaris is live, it'll be relatively easier for
developers outside Sun to debug problems. Thus, the motivation behind
this entry.

Most of the time when you get a crash dump from a kernel panic, it's
either during development (in which case it's easy to debug because
you *know* exactly what caused the code to fail) or it's while the code is
in production. It's harder to debug when you're given a dump obtained
from a production machine primarily because first you need to find out
what caused the code to fail and second you need to simulate the failure
in the lab.

A lot of the times, finding the root cause entails figuring out what
parameters were passed to functions and what do the local variables
look like at a certain point in time. I'll walk through an example to
demonstrate how function arguments and local variables can be excavated
from a dump.

Parameter passing on sparc -  A brief overview

Unlike x86 that passes function arguments on the stack and x64 that passes
function arguments (atleast most of them) in registers, sparc uses
register windows
to pass parameters. Arguments are passed in
%i0, %i1 .. %i5 with %i0 having the first parameter and so on. If there
are more than six input parameters to a function, parameters after the
sixth are passed on the stack. %i6 contains the frame pointer (%fp)

Local variables are allocated at an offset to the frame pointer.

Stack Format

The frame structure is defined in the system file usr/include/sys/frame.h
and it looks as follows -

struct frame {
long fr_local[8]; /* saved locals */
long fr_arg[6]; /* saved arguments [0 - 5] */
struct frame *fr_savfp; /* saved frame pointer */
long fr_savpc; /* saved program counter */
#if !defined(__sparcv9)
char *fr_stret; /* struct return addr */
#endif /* __sparcv9 */
long fr_argd[6]; /* arg dump area */
long fr_argx[1]; /* array of args past the sixth */
};

So the input parameters are in the fr_arg array.

Exacavating arguments with an NFSv4 bug

Using the bug 6268686 as an example and referencing OpenSolaris, let's look
at the stack trace that resulted in the panic -

> $C

000002a1012203d1 vpanic(1295800, 7aabd868, 7aabd880, 851, 2400, 2a1012210fc)
000002a101220481 assfail+0x74(7aabd868, 7aabd880, 851, 18c6000, 1295800, 0)
000002a101220531 nfs4_make_dotdot+0x4f4(2a101220df8, 2388873b24c20,
fffffffffffffff8, 301412eb920, 2a101221238, 1)
000002a101220941 nfs4lookupnew_otw+0x7d8(301ef0d4dc0, 2a101221530,
2a101221528, 301412eb920, df8475800, 38285c955c0)
000002a101220a71 nfs4_lookup+0x114(301ef0d4dc0, 2a101221530, 2a101221528,
301412eb920, 0, 391f52d44a8)
000002a101220b41 fop_lookup+0x28(301ef0d4dc0, 2a101221530, 2a101221528,
7aa69c2c, 0, 600045703c0)
000002a101220c01 lookuppnvp+0x344(2a1012217f0, 0, 600045703c0, 2a101221528,
2a101221530, 6000008dbc0)
000002a101220e41 lookuppnat+0x120(301ef0d4dc0, 0, 1, 0, 2a101221930, 0)
000002a101220f01 lookupnameat+0x5c(0, 0, 1, 0, 2a101221930, 0)
000002a101221011 vn_openat+0x164(1, 400, 1, 1, 0, 1)
000002a1012211d1 copen+0x260(ffffffffffd19553, 87aa3, 0, 50400, 0, 1)
000002a1012212e1 syscall_trap32+0x1e8(87aa3, 0, 50400, 0, 0, 0)

To set the context for this bug, we were trying to lookup a directory and it
so happened that we ended up calling nfs4_make_dotdot to get an rnode. The
comments in the code explain fairly well under what circumstances this function
is called -

/*
* nfs4_make_dotdot() - find or create a parent vnode of a non-root node.
*
* Our caller has a filehandle for ".." relative to a particular
* directory object. We want to find or create a parent vnode
* with that filehandle and return it.
.. snip

Like the comments say, we had a filehandle for ".." relative to the directory
object we're trying to lookup. So, to start off what was the pathname we're
trying to lookup? To determine this, we'd like to know what are the arguments
passed into the nfs4_make_dotdot function. Check the source and the function
is defined in uts/common/fs/nfs/nfs4_subr.c as -

int
nfs4_make_dotdot(nfs4_sharedfh_t *fhp, hrtime_t t, vnode_t *dvp,
cred_t *cr, vnode_t **vpp, int need_start_op)

The interesting bit is the passed in directory vnode pointer, dvp, and it's
passed in in the i2 register. If we can find out the dvp, we'll also know
the path we're playing with here.

64-bit sparc has a notion of stack bias and you need to add the stack bias to
the frame pointer in order to get the actual data of the stack frame.

Applying that to the frame pointer for nfs4_make_dotdot and dumping out
the frame, we have -

> 000002a101220531+0x7ff::print struct frame
{
fr_local = [ 0, 0, 0x381a4c49000, 0x2a101221108, 0x2a101220ee0,
0x2a101221118, 0x7aabd800, 0x7aabd800 ]
fr_arg = [ 0x2a101220df8, 0x2388873b24c20, 0xfffffffffffffff8,
0x301412eb920, 0x2a101221238, 0x1 ]
fr_savfp = 0x2a101220941
fr_savpc = 0x7aa6b474
fr_argd = [ 0x1, 0x5bc679f3060, 0, 0x2a101221250, 0x200000000,
0x5bc679f3178 ]
fr_argx = [ 0 ]
}

i2 here looks bogus, darn! Let's backup one function higher to
nfs4lookupnew_otw and see if it we can fish out dvp out of it's frame easily.
Quick look at the source in uts/common/fs/nfs/nfs4_vnops.c and -

static int
nfs4lookupnew_otw(vnode_t *dvp, char *nm, vnode_t **vpp, cred_t *cr)

The same dvp we're looking for should be in i0 provided it's not been
overwritten. Dump out the frame -

> 000002a101220941+0x7ff::print struct frame
{
fr_local = [ 0x391f52d4450, 0x381a4c49000, 0x2388873c342a0, 0x600074432a8,
0x2388873b24c20, 0, 0x1, 0x391f52d44e8 ]
fr_arg = [ 0x301ef0d4dc0, 0x2a101221530, 0x2a101221528, 0x301412eb920,
0xdf8475800, 0x38285c955c0 ]
fr_savfp = 0x2a101220a71
fr_savpc = 0x7aa69d40
fr_argd = [ 0x38f6901b110, 0x311fe247dc0, 0x2a101221704, 0x2a1012216fc,
0x2a101220c71, 0x7aa7ad20 ]
fr_argx = [ 0x2a101221264 ]
}

Quick check to see if it's been overwritten -

> nfs4lookupnew_otw::dis!grep i0
[ .. elided ]

It's not overwritten, we're in luck! Double check to see if it's a vnode.

> 0x301ef0d4dc0::whatis
301ef0d4dc0 is 301ef0d4dc0+0, bufctl 301ecba50c8 allocated from vn_cache

It sure is a vnode. Dumping out the path is now easy -

> 301ef0d4dc0::print vnode_t v_data |::print rnode4_t r_svnode.sv_name
r_svnode.sv_name = 0x391d8a24090
> 0x391d8a24090::print nfs4_fname_t fn_parent fn_name
fn_parent = 0x3b6565ba920
fn_name = 0x3292727cbe0 "uts"
> 0x3b6565ba920::print nfs4_fname_t fn_parent fn_name
fn_parent = 0x353987b1550
fn_name = 0x4f42a102fe0 "src"
> 0x353987b1550::print nfs4_fname_t fn_parent fn_name
fn_parent = 0

We're operating on ./src/uts and this isn't handled correctly in the lookup
handling routine (it's fixed now).

As I mentioned earlier, local variables are stored at an offset to the
frame pointer. Now that we have the frame pointer, we can dig out the local
variables. The variable of interest in this case was the error structure
declared on the stack for nfs4_make_dotdot here

A close look at the disassembly of the function and we can see -

nfs4_make_dotdot+0x27c:         mov       %l2, %o0
nfs4_make_dotdot+0x280: mov 0xc, %o3
nfs4_make_dotdot+0x284: call +0x15c58 nfs4_end_fop
nfs4_make_dotdot+0x288: mov %l7, %o5
nfs4_make_dotdot+0x28c: ba -0xd0 nfs4_make_dotdot+0x1bc
nfs4_make_dotdot+0x290: cmp %i5, 0
nfs4_make_dotdot+0x294: add %fp, 0x797, %o4
nfs4_make_dotdot+0x298: mov %l2, %o0
nfs4_make_dotdot+0x29c: mov 0xc, %o3
nfs4_make_dotdot+0x2a0: mov %l1, %i1
nfs4_make_dotdot+0x2a4: call +0x15c38 nfs4_end_fop
nfs4_make_dotdot+0x2a8: mov %l1, %o5
nfs4_make_dotdot+0x2ac: ld [%fp + 0x7bb], %i3 <------
nfs4_make_dotdot+0x2b0: cmp %i3, 0

that it's stored at fp + 0x7bb (fp is the fr_savfp in nfs4_make_dotdot's frame)
Dump it out -

> 0x2a101220941+0x7bb::print nfs4_error_t
{
error = 0
stat = 0t10006 (NFS4ERR_SERVERFAULT)
rpc_status = 0 (RPC_SUCCESS)
}

This reveals a secondary problem in the code which is that there are no
checks for errors like NFS4ERR_SERVERFAULT (again, now fixed).

Technorati Tag:
Technorati Tag:
Technorati Tag:



( Jun 14 2005, 12:53:42 PM EDT / Jun 14 2005, 11:50:01 AM EDT ) Permalink Comments [0]
Trackback: http://blogs.sun.com/aalok/entry/debugging_on_sparc

Trackback URL: http://blogs.sun.com/aalok/entry/debugging_on_sparc
Comments:

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed

« February 2010
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
      
       
Today


XML




    Blogroll


Today's Page Hits: 22

Locations of visitors to this page