|
As part of the I/O scheduling, ZFS has a field called 'zfs_vdev_max_pending'. This limits the maximum number of I/Os we can send down per leaf vdev. This is NOT the maximum per filesystem or per pool. Currently the default is 35. This is a good number for today's disk drives; however, it is not a good number for storage arrays that are really comprised of many disks but exported to ZFS as a single device.
This limit is a really good thing when you have a heavy I/O load as described in Bill's "ZFS vs. The Benchmark" blog.
But if you've created say a 2 device mirrored pool - where each device is really a 10 disk storage array, and you think that ZFS just
isn't doing enough I/O for you, here's a script to see if that's true:
#!/usr/sbin/dtrace -s
vdev_queue_io_to_issue:return
/arg1 != NULL/
{
@c["issued I/O"] = count();
}
vdev_queue_io_to_issue:return
/arg1 == NULL/
{
@c["didn't issue I/O"] = count();
}
vdev_queue_io_to_issue:entry
{
@avgers["avg pending I/Os"] = avg(args[0]->vq_pending_tree.avl_numnodes);
@lquant["quant pending I/Os"] = quantize(args[0]->vq_pending_tree.avl_numnodes);
@c["total times tried to issue I/O"] = count();
}
vdev_queue_io_to_issue:entry
/args[0]->vq_pending_tree.avl_numnodes > 349/
{
@avgers["avg pending I/Os > 349"] = avg(args[0]->vq_pending_tree.avl_numnodes);
@quant["quant pending I/Os > 349"] = lquantize(args[0]->vq_pending_tree.avl_numnodes, 33, 1000, 1);
@c["total times tried to issue I/O where > 349"] = count();
}
/* bail after 5 minutes */
tick-300sec
{
exit(0);
}
If you see the "avg pending I/Os" hitting your vq_max_pending limit, then raising the limit would be a good thing. The way to do that used to be per vdev, but we now have a single global way to change all vdevs.
heavy# mdb -kw
Loading modules: [ unix genunix specfs dtrace cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba fctl nca lofs zfs random nfs cpc fcip logindmux ptm sppp ipc ]
> zfs_vdev_max_pending/E
zfs_vdev_max_pending:
zfs_vdev_max_pending: 35
> zfs_vdev_max_pending/W 0t70
zfs_vdev_max_pending: 0x23 = 0x46
> zfs_vdev_max_pending/E
zfs_vdev_max_pending:
zfs_vdev_max_pending: 70
>
The above will change the max # of pending requests to 70, instead of 35.
So having people tune variables is never desireable, and we'd like 'vq_max_pending' (among others) to be dynamically set, see:
6457709 vdev_knob values should be determined dynamically .
(2008-03-03 14:19:42.0/2006-08-07 11:22:50.0)
Permalink
Trackback: http://blogs.sun.com/erickustarz/en_US/entry/vq_max_pending
|
Posted by Jason Larke on August 08, 2006 at 11:17 AM PDT #
Nope, its an excellent question... i imagine you're running U2? There's a bug in the CTF data that breaks this script on U2 and there's no work around.
Posted by eric kustarz on August 08, 2006 at 11:30 AM PDT #
Posted by Ivan Debnar on September 06, 2006 at 10:10 AM PDT #
Posted by Ivan Debnar on September 06, 2006 at 10:21 AM PDT #
I have
ffffffff936b2a80 HEALTHY - root ffffffff936b2000 HEALTHY - mirror ffffffff936b2540 HEALTHY - /dev/dsk/c6t3d0s0 ffffffff936b3ac0 HEALTHY - /dev/dsk/c8t22260001552EFE2Cd0s0 ... etc ...Should the 'vq_max_pending' be changed for the "mirror" device as well, or only for the components? And what about 'root' component?Thanks for enlightment.
Posted by Ivan Debnar on September 06, 2006 at 02:42 PM PDT #
It should matter only to the leaf vdevs - so you don't have to worry about modifying the "mirror" vdev or "root" vdev.
The other questions have been taken to zfs-discuss - thanks as thats the best way to get questions answered!
Posted by eric kustarz on September 07, 2006 at 11:56 AM PDT #