Data Processing
Valdis's Weblog
Archives
« November 2009
MonTueWedThuFriSatSun
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
      
Today
Click me to subscribe
Search

Links
 

Today's Page Hits: 29

Locations of visitors to this page
« My domestic environm... | Main | I am not young enoug... »
Monday Feb 12, 2007
Improving I/O throughput for T2000 servers
If you are using T2000 servers :

While tuning & benchmarking an application on T2000, Sol 10, using QFS 4.5 and ST6140 we doubled I/O throughput (MB/s) and therefore application performance by 100%. The Sun StorageTek 6140 has a theoretical bandwidth of 800 MB/s, by increasing the pci-max-read-request and tuning filesystem and disk volume blocksizes the T2000 was able to exploit our storage to about 700MB/s. This was a I/O intensive app, thus the same performance gains may not be seen in other environments. I think that it could be useful anyway in general applications if we increase it from the default. However, do not increase to it's maximum value as this could do more harm than good.

A relatively unknown, "qlc.conf" parameter, specifically the "pci-max-read-request" has a very low default setting, this restricts I/O performance and therefore makes the T2000 server slow and also does not exploit the full capabilities of Sun storage. This can be fixed by setting the parameter to a higher value. This is only applicable to T2000 using PCIe.

Details below:

My recommendation: Set pci-max-read-request=2048 on all T2000 servers.

Parameter explanation:

Set in " /sysconfig/drv/qlc.conf "

#Name: PCI max read request override;
#Type: Integer, bytes; Range: 128, 256, 512, 1024, 2048, 4096
#Usage: This field specifies the value to be used for the PCI max read request, overriding the value programmed by the system.
#NOTE: The minimum value is 128 bytes; if this variable does not exist or is not equal to 128, 256, 512, 1024, 2048 or 4096, the ISP2xxx
# defaults to values specified by the system.
Background:
When you write data to a target, you are reading from PCI. The issue is that T2K defaults this to 128 bytes. To get best performance this has to be set to 4096. But its not that simple. SUN PCI folks warn that if we set this too high then we are eating up resources from other devices on the bus such as ethernet. There is currently a project going on in PCI team to make this process automatic. In the meantime folks have set this to 2048 without seeing any issues.

Other things to check for anyone optimizing I/O throughput on general Sun servers. These are the Solaris parameters that need attention if you want to improve throughput. Only change and test during development, not on production systems. As always use good project management and change control procedures when implementing any kernel changes.

Allow Solaris and Solaris disk drivers to perform the maximum size I/Os you anticipate. Setting these numbers lower than what the application and file system prefer, results in fragmentation of I/Os. Setting these numbers higher consumes more memory and may cause some very old disk or channel hardware to become unstable, but for recent large memory footprint machines and disks this is typically not an issue. The system must be booted for these changes to take effect:

Set the maximum I/O size for Solaris by editing the /etc/system file.
# /etc/system
# this sets maximum physical I/O size to 8MB
set maxphys = 0x800000

Set the maximum I/O size for fibre channel disks by editing the /kernel/drv/ssd.conf file:
ssd_max_xfer_size = 0x800000;

You have to set up /kernel/drv/sd.conf for individual SCSI disks:
name="sd" class="scsi"
sd_max_xfer_size=0x800000
target=3 lun=0;

Solaris Queue depth, how many SCSI commands can be queued to a LUN.
Set this too low and you are not getting max performance, too high and you can get device overruns, which then cause recovery processes to redrive/retry I/O and causes a slowdown, so normally stay with 64 (NB check with storage supplier as 3rd party equipment often is not optimized for multithreading as is Sun Storage).

ssd_max_throttle=64
(change sd_max_throttle for SCSI disks)

There may be other configuration files and option settings for non-Sun host bus adapters and drivers. See the third-party documents for more.

Caveat, 3rd party arrays often do not multi-thread and scale as well in performance as does Sun Storage. It is possible with large queues to “overrun” 3rd party storage. We have often had to reduce the queue depth (ssd_max_throttle) for slower 3rd party devices. I will not get into this as 3rd party or aftermarket suppliers will often disagree with me.

This is a personal recommendation, which will hopefully avoid me from documenting elsewhere and having to explain this several times a month.

This is not a Sun Engineering, or official Sun Solaris patch, fix etc.
Posted at 10:06PM Feb 12, 2007 by Valdis Filks in Technical  |  Comments[15]

Comments:

Your tweak above would seem to be appropriate for most systems wanting to put a lot of IO. No ?

Posted by Sean O'Neill on February 13, 2007 at 05:30 AM CET #

Exactly, however we never know how all of our servers are used. I have already had conversations with people discussing exactly this. If your application is I/O intensive and sequential with large files (>1MB) and blocksizes (>64KB) then this will fit. However, if it is OLTP with small blocksizes e.g. 4KB, then you may not see the difference. Generally, though a happy value for "pci-max -read-request" should be about 2048, definately higher than the default of 128.

Posted by Valdis Filks on February 13, 2007 at 10:04 AM CET #

does this apply to T2K only, what about x86 based systems with qlc drivers? also you do not talk about MPXIO..is there any tuning here to increase IOPS or Throughput? thanks

Posted by miles on February 13, 2007 at 06:40 PM CET #

I recommend the PCI-MAX-READ-REQUEST to be changed on T2K, I have not had the chance to check on others. All other Solaris parms (maxphys, xfer_size, max_throttle) are applicable on X86, SPARC, X64 systems. I do not talk about MPXIO as the blog would end up too big. When I have used this no changes were not required to MPXIO. I always use MPXIO on Solaris, never had any performance problems, it is transparent, scales and is free. Which is important as other vendors charge.

Posted by Valdis Filks on February 14, 2007 at 09:46 AM CET #

Do you mean /kernel/drv/qlc.conf instead of /sysconfig/drv/qlc.conf ?

Posted by Judy Leach on March 22, 2007 at 06:54 PM CET #

Why is this T2K specific? I am not an Solaris guy, I thought the settings in the qlc.conf comes with Solaris or when the hardware (FC HBA) is installed and recognized by Solaris?

Posted by Kern Chang on April 07, 2007 at 07:23 PM CEST #

How do you tell if you are using PCIe?

Posted by Glen on April 18, 2007 at 05:56 PM CEST #

Answer for Judy: It is set in /kernel/drv/qlc.conf, the above reference may be a link or output from explorer.

Posted by Valdis Filks on April 23, 2007 at 04:45 PM CEST #

Answer for Glen: It is T2K specific due to the bus design. Yes all parms come with the HBA driver when it is installed, however with Solaris we can change and tune these parms. This is a great benefit as Solaris is extremely adaptable. Also, is a disadvantage if you do the wrong thing.

Posted by Valdis Filks on April 23, 2007 at 04:48 PM CEST #

PCIe question: Best way is to see how many connectors your card has, then look it up in a place like: http://www.techweb.com/encyclopedia/defineterm.jhtml?term=PCIExpress http://en.wikipedia.org/wiki/PCIe When I am on a plane, with colleagues or family, to alleviate the boredom I often explain what plane model we are in. I actually cannot tell and I am not a plane expert, but I read the plane model number from the emergency instructions in the seat pocket in front. Annoying, amusing but a bit of lateral thinking goes a long way. Similary you can read the card or the server/HBA instructions.

Posted by Valdis Filks on April 23, 2007 at 05:03 PM CEST #

Does the qlc change work on Netra T2000? That is what I have but I think the bus is different. I know it is the same MB but I do have PCI-E slots. Also, wonder if this would work for the 1394 card I have.

Our applications do quite a lot of ethernet I/O but I would like to try to improve performance on those systems by setting the pci-max-read-request to 2048. Will this help with ethernet as well?

Posted by Don Weeks on September 09, 2007 at 06:42 PM CEST #

Apologies for late reply Don, I have not tested this on a Netra T2000 but would expect it to work. Netra boxes are NEBS qualified/certified but computer architecture is very similar. This will not work for ethernet, it is a I/O driver parm tuning exercise.

Posted by Valdis on October 24, 2007 at 02:54 PM CEST #

Is this appicable to only qlogic, what if I have an emulex card or is it bus specific and only on the qlc.conf file as "somewhere"

Posted by Veltror on November 06, 2007 at 02:17 PM CET #

We have a Sun T5220 with Sun branded Qlogic pci-e hba. What should one use for the settings below.

1:Loop Reset Delay (seconds)
2:Enable HBA Hard Loop ID
3: Enable FCP-2 Error Recovery
4: Login Retry Count
5: Port Down Retry Count
6 Link Down Timeout (seconds

Posted by AA on February 20, 2008 at 06:05 AM CET #

For the 6140, do you know what the queue depth is per port? I'm surprised this isn't in Sun's spec sheets.

Posted by Merill Ronquillo on August 31, 2008 at 10:03 PM CEST #

Post a Comment:
  • HTML Syntax: NOT allowed