UFS in Solaris 10
I spent the better part of the last year getting to know UFS. I think we are on a first name basis now :-). Thus, I begin my blog debut with some interesting UFS bugs and how they were fixed.
UFS had many improvements integrated in to Solaris 10 and Solaris 9 9/04: Bug fixes, logging on by default and general robustness improvements. In this post I will talk about three specific bug fixes which affect the UFS tuneable maxcontig and therefore aspects of UFS performance.
4639871 Logging ufs fails to boot from ATA drive on Ultra-10 if maxphys is too large
4638166 Ultra 5/10 panics with simba and pci errors if logging enabled and maxphys > 1MB
4349828 Inconsiderate tuning of maxcontig causes scsi bus to hang
As a result of these bugs, UFS in Solaris 10 and Solaris 9 9/04 was modified to change the values that could be used to set maxcontig and subsequently the value used for the maximum transfer size when I/O was issued.
Previously, an inconsiderate value set either for maxcontig or maxphys(in /etc/system) would result in a system getting hung. This was due to the fact that the filesystem I/O request size was calculated using the value set for maxcontig. The maximum transfer rate of the underlying device was never considered when calculating the size of the I/O transfer in UFS.
In UFS, the filesystem cluster size, for both reads and writes, is set to the value set for maxcontig. The filesystem cluster size is used to determine:
- The maximum number of logical blocks contiguously
laid out on disk for a UFS filesystem before inserting
a rotational delay.
- When, and the amount to read ahead and/or write behind if
the sequential IO case is found. The algorithm
that determines sequential read ahead in UFS is broken, so
system administrators use the maxcontig value to tune
their filesystems to achieve better random I/O performance.
- The UFS filesystem cluster size also indicates how many pages to attempt to push out to disk at a time. It also determines the frequency of pushing pages because in UFS pages are clustered for writes, based on the filesystem cluster size.
How These Bugs Were Fixed:
1) The UFS filesystem cluster size(maxcontig) and I/O transfer size were separated, therefore removing the dependency that was causing systems to hang. UFS will no longer allow a setting of maxcontig to interrupt or hang any I/O requests to the device. UFS will always issue I/O requests that <= maximum transfer size of the device hosting the filesystem.
The UFS filesystem cluster size is still set using the value indicated for maxcontig. The I/O transfer size will be set in UFS as shown below.
2) The value for rotational delay(gap mkfs(1M),-d tunefs(1M)) no longer makes sense. The devices today are very sophisticated and do not need a delay artificially built in via software. As noted above, the value of maxcontig, determines the length of contiguous blocks placed on disk, before inserting space to account for rotational delay. The value for rotational delay has been obsoleted in Solaris 10 and Solaris 9 9/04 and defaults to 0 now, ensuring contiguous allocation.
Transfer size of I/O requests in UFS:
The device that hosts the filesystem will be queried as to the maximum transfer size it can handle, and the UFS I/O transfer size will default to this, if this information is obtainable. If the device does not support obtaining the maximum transfer data, the maximum transfer will be set using:
- min(maxphys, ufs_maxmaxphys).
- ufs_maxmaxphys is currently set to 1MB.
If, however the user sets the value of maxcontig to be less than the maximum device transfer size, UFS will honor the value of maxcontig as the maximum value for data transfers on this device.
maxcontig:
The default value is determined from the disk drive's maximum transfer size as noted above. Any positive integer value is acceptable when setting this parameter, via tunefs(1M) or mkfs(1M).
Posted by Malaga on April 13, 2005 at 02:22 AM MST #
Posted by Michael Crozier on October 14, 2005 at 02:19 PM MST #
Posted by Ravi Nallappan on July 03, 2006 at 09:01 AM MST #