Using Solaris Resource Management Utilities to Improve Application Performance
The SPECjAppServer2004 benchmark is a very complex benchmark produced by the Standard Performance Evaluation Corporation (SPEC). Servers measured with this benchmark exercise all major Java 2 Enterprise Edition (J2EE) technologies, including transaction management, database connectivity, web containers, and Enterprise JavaBeans. The benchmark heavily exercises the hardware, software, and network, as hundreds or thousands of JOPS (jAppServerOperationsPerSecond) are loaded onto Systems Under Test (SUTs).
This article introduces some of the Solaris resource management utilities that are used for this benchmark. These utilities may be useful to system managers who are responsible for complex servers. The author has applied these features to improve performance when using multiple instances of J2EE application server software with the SPECjAppServer2004 benchmark.
In SPECjAppServer2004 benchmark results submitted by Sun Microsystems, you can find references to Solaris Resource Management features such as Containers, Zones, Processor sets, and Scheduling classes. The recently published results for the Sun Fire T5440 and the Sun Fire T5140 servers use many of these features.
Solaris Resource Management utilities are used to provide isolation of applications and better management of system resources. There are a number of publications which describe many of the features and benefits. The Sun Solaris Container Administration Guide and Sun Zones Blueprint are two of many sources of good information.
Solaris Containers
Looking
at the first benchmark publication listed above, the Sun Fire T5440 server was
configured with 8 Solaris Containers where each container or zone was
setup to host a single application server instance. By hosting an
application server instance in a container, the memory and network
resources used by that instance are virtually isolated from the memory
and network resources used by other instances running in separate
containers.
While running the application software in a zone does not directly increase performance, using Containers with this benchmark workload makes it easier to manage multiple J2EE instances. When combined with the techniques below, using Solaris Containers can be an effective environment to help improve application performance.
Note that many Solaris performance utilities can be used to monitor and report process information for the configured zones, such as prstat with the -Z option.
Processor Sets
The
System Administration Guide for Solaris Containers discusses use of
Resource Pools to partition machine resources. A resource pool is a
configuration mechanism used to implement a processor set and possibly
combine with a scheduling class to configure with a zone. When
configuring a resource pool, the administrator will specify the min
and max cpu resources for the pool and the system will create the
processor set with this information. The Resource Pool can then be
configured with a specific zone using the zonecfg(1M) utility.
However, in some scenarios, it is possible that the processor IDs
selected for the resource pool may span multiple cpu chips, and thus
may not make most efficient use of caches or access to local memory.
For
the configurations in the published results, each Solaris Container was
bound to a unique processor set, where each processor set was composed
of 4 UltraSPARC T2 Plus cores. Since each UltraSPARC T2 Plus core
consists of 8 hardware strands, each cpu chip was partitioned into two
processor sets of 32 processor IDs. The processor sets were created by
specifying the 32 processor ids as an argument to the psrset (1M) command as shown in the following example:
% psrset -c 32-63
The
command above instructs Solaris to create a processor set using virtual
processor numbers 32 thru 63 from 4 cores of an UltraSPARC T2 Plus cpu
chip. With a total of four UltraSPARC T2 Plus
cpu chips, the Sun Fire T5440 system was configured to use 7 processor
sets of 4 cores each. The remaining 4 cores (virtual processor numbers
0-31) remained in the default processor set, as there must be at least
1 virtual processor ID in the default set.
Looking at the Sun Fire T5440 System Architecture , each UltraSPARC T2 Plus cpu chip has 4 MB of L2 cache shared by all 8 cores in the chip. Each UltraSPARC T2 Plus cpu also
has direct links to 16 DIMM slots of local memory with access to the
remaining or remote memory DIMMs using an External Coherency Hub. Data
references to local memory generally have slightly faster access as any
data access through an External Coherency Hub will incur a small added latency as Denis indicates. This combination of CPU hardware and physically local memory is treated by Solaris as a Locality Group.
Solaris attempts to allocate physical memory pages from the same
locality group associated with the CPU executing the application
process/thread. To help reduce latency for data accesses by an
application, processor sets are a simple and effective means to
co-locate data accesses within an L2 cache and a Locality Group
boundary.
To use a Container with a specific processor set requires binding the processes running in the Container to the specified processor set. This can be done using the pgrep and psrset commands. Use pgrep -z ZONENAME to obtain the list of process IDs currently running in the specified zone. Then use psrset -b PSET PID to bind a process ID obtained earlier using pgrep to the specified processor set as shown in the following example:
% for PID in `pgrep -z ZONENAME`; do psrset -b PSET_ID $PID; done
Scheduling Class
Solaris offers a number of different process scheduling classes to execute user processes which are administered using the utilities dispadmin(1M) and priocntl(1M). The default is the Time Sharing or TS scheduling class. However many benchmark results have made use of the Fixed Priority or FX scheduling class. The dispadmin command can be used to list the classes supported on the system with associated priority and time quantim parameters. Processes normally running in the TS class can be run in the FX class using the priocntl command with either of the following methods:
% priocntl -e -c FX <COMMAND> <ARGS>
or
% priocntl -s -c FX -i pid <PID>
The first case executes a command starting in the FX class and the second case changes the scheduling class of a running process using the process ID.
The following article FX for Databases discusses this subject for the Database application space in some detail. Similar considerations apply to J2EE application software. Running the application server instances in the FX scheduling class has shown to reduce the number of context switches and help improve overall throughput.
Additional Sources:
Solaris™ Internals: Solaris 10 and OpenSolaris Kernel Architecture Second Edition by Richard McDougall and Jim Mauro
Disclosure:
SPEC, SPECjAppServer reg tm of Standard Performance Evaluation Corporation. Results from www.spec.org as of 6/10/09.
