Friday July 08, 2005 | Richard McDougall's Weblog Commentary from Race Control |
|
Tracing the Solaris 10 File System Interface Here's a quick script to trace activity though the central file system interface. Until there is a general file system provider, this script should serve as a basic framework help construct other file system tracing scripts.
# ./voptrace.d /tmp Event Device Path RW Size Offset fop_putpage - /tmp//filebench/bin/i386/fastsu - 4096 4096 fop_inactive - /tmp//filebench/bin/i386/fastsu - 0 0 fop_putpage - /tmp//filebench/xanadu/WEB-INF/lib/classes12.jar - 4096 204800 fop_inactive - /tmp//filebench/xanadu/WEB-INF/lib/classes12.jar - 0 0 fop_putpage - /tmp/filebench1.63_s10_x86_sparc_pkg.tar.Z - 4096 7655424 fop_inactive - /tmp/filebench1.63_s10_x86_sparc_pkg.tar.Z - 0 0 fop_putpage - /tmp//filebench/xanadu/WEB-INF/lib/classes12.jar - 4096 782336 fop_inactive - /tmp//filebench/xanadu/WEB-INF/lib/classes12.jar - 0 0 fop_putpage - /tmp//filebench/bin/amd64/filebench - 4096 36864
The source is below:
#!/usr/sbin/dtrace -s
/*
* Trace the vnode interface
*
* USAGE: voptrace.d [/all | /mountname ]
*
* Author: Richard McDougall
*
* 7/8/2005
*/
#pragma D option quiet
:::BEGIN
{
printf("%-15s %-10s %51s %2s %8s %8s\n",
"Event", "Device", "Path", "RW", "Size", "Offset");
self->trace = 0;
self->path = "";
}
::fop_*:entry
/self->trace == 0/
{
/* Get vp: fop_open has a pointer to vp */
self->vpp = (vnode_t **)arg0;
self->vp = (vnode_t *)arg0;
self->vp = probefunc == "fop_open" ? (vnode_t *)*self->vpp : self->vp;
/* And the containing vfs */
self->vfsp = self->vp ? self->vp->v_vfsp : 0;
/* And the paths for the vp and containing vfs */
self->vfsvp = self->vfsp ? (struct vnode *)((vfs_t *)self->vfsp)->vfs_vnodecovered : 0;
self->vfspath = self->vfsvp ? stringof(self->vfsvp->v_path) : "unknown";
/* Check if we should trace the root fs */
($1 == "/all" ||
($1 == "/" && self->vfsp && \
(self->vfsp == `rootvfs))) ? self->trace = 1 : self->trace;
/* Check if we should trace the fs */
($1 == "/all" || (self->vfspath == $1)) ? self->trace = 1 : self->trace;
}
/*
* Trace the entry point to each fop
*
*/
::fop_*:entry
/self->trace/
{
self->path = (self->vp != NULL && self->vp->v_path) ? stringof(self->vp->v_path) : "unknown";
self->len = 0;
self->off = 0;
/* Some fops has the len in arg2 */
(probefunc == "fop_getpage" || \
probefunc == "fop_putpage" || \
probefunc == "fop_none") ? self->len = arg2 : 1;
/* Some fops has the len in arg3 */
(probefunc == "fop_pageio" || \
probefunc == "fop_none") ? self->len = arg3 : 1;
/* Some fops has the len in arg4 */
(probefunc == "fop_addmap" || \
probefunc == "fop_map" || \
probefunc == "fop_delmap") ? self->len = arg4 : 1;
/* Some fops has the offset in arg1 */
(probefunc == "fop_addmap" || \
probefunc == "fop_map" || \
probefunc == "fop_getpage" || \
probefunc == "fop_putpage" || \
probefunc == "fop_seek" || \
probefunc == "fop_delmap") ? self->off = arg1 : 1;
/* Some fops has the offset in arg3 */
(probefunc == "fop_close" || \
probefunc == "fop_pageio") ? self->off = arg3 : 1;
/* Some fops has the offset in arg4 */
probefunc == "fop_frlock" ? self->off = arg4 : 1;
/* Some fops has the pathname in arg1 */
self->path = (probefunc == "fop_create" || \
probefunc == "fop_mkdir" || \
probefunc == "fop_rmdir" || \
probefunc == "fop_remove" || \
probefunc == "fop_lookup") ?
strjoin(self->path, strjoin("/", stringof(arg1))) : self->path;
printf("%-15s %-10s %51s %2s %8d %8d\n",
probefunc,
"-", self->path, "-", self->len, self->off);
self->type = probefunc;
}
::fop_*:return
/self->trace == 1/
{
self->trace = 0;
}
/* Capture any I/O within this fop */
io:::start
/self->trace/
{
printf("%-15s %-10s %51s %2s %8d %8u\n",
self->type, args[1]->dev_statname,
self->path, args[0]->b_flags & B_READ ? "R" : "W",
args[0]->b_bcount, args[0]->b_blkno);
}
Technorati Tag: OpenSolaris Technorati Tag: Solaris Technorati Tag: DTrace ( Jul 08 2005, 04:05:55 PM PDT ) Permalink Comments [1]
Using DTrace for Memory Analysis Following on from yesterday's post on using prstat to look at memory slow-downs, here is a more advanced way of investigating. The DTrace probes help us identify all sources of paging in the system, and give us the ability to drill down quickly to identify cause and affect. With DTrace, you can probe more deeply into the sources of activity observed with higher-level memory analysis tools. For example, if you determine that a significant amount of paging activity is due to a memory shortage, you can determine which process is initiating the paging activity. In another example, if you see a significant amount of paging due to file activity, you can drill down to see which process and which file is responsible. DTrace allows for memory analysis through a vminfo provider, and, optionally, through deeper tracing of virtual memory paging with the fbt provider. The vminfo provider probes correspond to the fields in the "vm" named kstat: a probe provided by vminfo fires immediately before the corresponding vm value is incremented. The table below from the DTrace guide lists the probes available from the VM provider. A probe takes the following arguments:
For example, if you should see the following paging activity with vmstat, indicating page-in from the swap device, you could drill down to investigate.
sol8# vmstat -p 3
memory page executable anonymous filesystem
swap free re mf fr de sr epi epo epf api apo apf fpi fpo fpf
1512488 837792 160 20 12 0 0 0 0 0 8102 0 0 12 12 12
1715812 985116 7 82 0 0 0 0 0 0 7501 0 0 45 0 0
1715784 983984 0 2 0 0 0 0 0 0 1231 0 0 53 0 0
1715780 987644 0 0 0 0 0 0 0 0 2451 0 0 33 0 0
sol10$ dtrace -n anonpgin'{@[execname] = count()}'
dtrace: description 'anonpgin' matched 1 probe
svc.startd 1
sshd 2
ssh 3
dtrace 6
vmstat 28
filebench 913
Using DTrace to Estimate Memory Slowdowns You can use Using DTrace to, we can directly measure time elapsed time around the page-in probes when a process is waiting for page-in from the swap device, as in this example.
sched:::on-cpu
{
self->on = vtimestamp;
}
sched:::off-cpu
/self->on/
{
@oncpu[execname] = sum(vtimestamp - self->on);
self->on = 0;
}
vminfo:::anonpgin
{
self->anonpgin = 1;
}
:::pageio_setup:return
{
self->wait = timestamp;
}
:::pageio_done:entry
/self->anonpgin == 1/
{
self->anonpgin = 0;
@pageintime[execname] = sum(timestamp - self->wait);
self->wait = 0;
}
END
{
normalize(@oncpu, 1000000);
printf("Who's on cpu (milliseconds):\n");
printa(" %-50s %15@d\n", @oncpu);
normalize(@pageintime, 1000000);
printf("Who's waiting for pagein (milliseconds):\n");
printa(" %-50s %15@d\n", @pageintime);
}
With an aggregation by execname, you can we can look to see who is being held up by paging the most.
sol10$./whospaging.d ^C Who's on cpu (milliseconds): svc.startd 1 loop.sh 2 sshd 2 ssh 3 dtrace 6 vmstat 28 pageout 60 fsflush 120 filebench 913 sched 84562 Who's waiting for pagein (milliseconds): filebench 230704 The DTrace script displays the amount of time the program spends doing useful work compared to the amount of time it spends waiting for page-in. The next script measures the elapsed time from when a program stalls on a page in from the swap device (anonymous page ins) and when it resumes for a specific pid target, specified on the command line. sched:::on-cpu
/pid == $1/
{
self->on = vtimestamp;
}
sched:::off-cpu
/self->on/
{
@time["
In the following example, the program spends 0.9 seconds doing useful work, and 230 seconds waiting for page-ins.
sol10$ /pagingtime.d 22599 dtrace: script './pagingtime.d' matched 10 probes ^C 1 2 :END Time breakdown (milliseconds): <on cpu> 913 <paging wait> 230704
Technorati Tag: OpenSolaris Technorati Tag: Solaris Technorati Tag: DTrace ( Jun 30 2005, 01:15:58 PM PDT ) Permalink Comments [1]
I have created a repository for DTrace scripts that we have been referencing. Today it's just a download, we will be adding some more verbose descriptions of what the tools do shortly.
Technorati Tag: OpenSolaris Technorati Tag: Solaris Technorati Tag: DTrace ( Apr 19 2005, 07:23:38 AM PDT ) Permalink Comments [2]
Using DTrace to Estimate File System Cache Hit Ratio We now have a method of collecting one of the most significant performance statistics for a file system in Solaris -- the cache hit rate in the file system page cache. By using DTrace with probes at the entry and exit to the file system, we can collect the logical io's into the file system and physical io's from the file system into the device I/O subsystem. These two statistics give us insight into how effective the file system cache is being, and if there might be grounds for adding physical memory in order to increase the amount of file system level caching.
#!/usr/sbin/dtrace -s
#pragma D option quiet
::fop_read:entry
/self->trace == 0 && (((vnode_t *)arg0)->v_vfsp)->vfs_vnodecovered/
{
vp = (vnode_t*)arg0;
vfs = (vfs_t *)vp->v_vfsp;
mountvp = vfs->vfs_vnodecovered;
uio = (uio_t*)arg1;
self->path=stringof(mountvp->v_path);
@rio[stringof(mountvp->v_path), "logical"] = count();
@rbytes[stringof(mountvp->v_path), "logical"] = sum(uio->uio_resid);
self->trace = 1;
}
::fop_read:entry
/self->trace == 0 && (((vnode_t *)arg0)->v_vfsp == `rootvfs)/
{
vp = (vnode_t*)arg0;
vfs = (vfs_t *)vp->v_vfsp;
mountvp = vfs->vfs_vnodecovered;
uio = (uio_t*)arg1;
self->path="/";
@rio[stringof("/"), "logical"] = count();
@rbytes[stringof("/"), "logical"] = sum(uio->uio_resid);
self->trace = 1;
}
::fop_read:return
/self->trace == 1/
{
self->trace = 0;
}
io::bdev_strategy:start
/self->trace/
{
@rio[self->path, "physical"] = count();
@rbytes[self->path, "physical"] = sum(args[0]->b_bcount);
}
tick-5s
{
trunc (@rio, 20);
trunc (@rbytes, 20);
printf("\033[H\033[2J");
printf ("\nRead IOPS\n");
printa ("%-60s %10s %10@d\n", @rio);
printf ("\nRead Bandwidth\n");
printa ("%-60s %10s %10@d\n", @rbytes);
trunc (@rbytes);
trunc (@rio);
}
Using this script, we probe the amount of logical bytes into the file system though the new Solaris 10 file system fop layer. We count the physical bytes using the io provider. Running the script, we can see the logical and physical for a file system which can be used to calculate the hit rate:
Read IOPS /data1 physical 287 /data1 logical 2401 Read Bandwidth /data1 physical 2351104 /data1 logical 5101240 The /data1 file system on this machine is doing 2401 logical iops, and 287 physical - i.e. a hit rate of 2401 / (2401 + 287) = 89%. It is also doing 5.1MB/s logical and 2.3MB/s physical. We can also do this at the file level:
#!/usr/sbin/dtrace -s
#pragma D option quiet
::fop_read:entry
/self->trace == 0 && (((vnode_t *)arg0)->v_path)/
{
vp = (vnode_t*)arg0;
uio = (uio_t*)arg1;
self->path=stringof(vp->v_path);
self->trace = 1;
@rio[stringof(vp->v_path), "logical"] = count();
@rbytes[stringof(vp->v_path), "logical"] = sum(uio->uio_resid);
}
::fop_read:return
/self->trace == 1/
{
self->trace = 0;
}
io::bdev_strategy:start
/self->trace/
{
@rio[self->path, "physical"] = count();
@rbytes[self->path, "physical"] = sum(args[0]->b_bcount);
}
tick-5s
{
trunc (@rio, 20);
trunc (@rbytes, 20);
printf("\033[H\033[2J");
printf ("\nRead IOPS\n");
printa ("%-60s %10s %10@d\n", @rio);
printf ("\nRead Bandwidth\n");
printa ("%-60s %10s %10@d\n", @rbytes);
trunc (@rbytes);
trunc (@rio);
}
Technorati Tag: OpenSolaris Technorati Tag: Solaris Technorati Tag: DTrace ( Apr 13 2005, 09:36:27 PM PDT ) Permalink Comments [0]
|
|
||||||||||||||||||||||||||||||||||||