Ravindra TalashikarRavindra Talashikar's weblog |
|
Monday Nov 19, 2007
Corestat for UltraSPARC T2/T2+
Corestat for UltraSPARC T2/T2+ :With the launch of UltraSPARC T2+ processor based servers, corestat needs an upgradation. Updated version of corestat is now available off the link from this blog. Also note that the same version (V1.2.3) should work on T5220, T5240 and T5240 servers.Understanding processor utilization is important for performance analysis and capacity planning. With the launch of UltraSPARC T2 based servers I would like to revisit the topic of core utilization. As we have seen earlier, for a Chip Multi Threaded (CMT) processor, like UltraSPARC T1, CPU utilization reported by conventional tools like mpstat/vmstat and core utilization reported using hardware performance counters in the processor are different metrics and both are equally important in performance analysis and tuning. Before discussing the details about core utilization of UltraSPARC T2 and the details about corestat let us take a quick look at what does a core on UltraSPARC T2 look like. UltraSPARC T2 extends the CMT architecture of T1. It consists of eight cores where each core has eight hardware threads. Hardware threads within a core are grouped into two sets of four threads each. There are two integer pipelines within a core and each set of four threads share one integer pipeline. In this sense, the resources available for computation within a core are doubled from that in UltraSPARC T1. It is worth understanding that threads within a core do not switch pipelines and the assignment of threads to a pipeline is fixed and hardwired. One more important addition to the compute resources within a core is a Floating Point Unit (FPU). Each core of T2, includes a FPU shared by all eight threads from that core. Other shared resources within a core include Level-1 Instruction (I) and Data (D) cache and Translation Look aside Buffers (TLBs) like I-TLB and D-TLB. All cores share a 4 MB Level-2 (L2) cache. Including these there are key features why both single thread and multi thread performance of UltraSPARC T2 is better than T1. A quick look at the UltraSPARC T2 architecture features shows following enhancements which benefit single thread performance :
Considering these differences, corestat for UltraSPARC T2/T2+ has been enhanced and can be downloaded from here . The main enhancements are :
While the usage remains same, corestat for UltraSPARC T2 can be used in two modes :
cpustat -n -c pic0=Instr_cnt,pic1=Instr_FGU_arithmetic
nouser,sys 1-c pic0=Instr_cnt,pic1=Instr_FGU_arithmetic, $ corestat
corestat : Permission denied. Needs root privilege...Frequency = 1050 MHz Usage : corestat [-g] [-v] [[-f <infile>] [-i <interval>] [-r <freq>]]
Default mode : Report Integer Pipeline Utilization -g : Report FPU usage -v : Report version number -f infile : Filename containing sampled cpustat data -i interval : Reporting interval in sec (default = 10 sec) -r freq : Processor frequency in MHz (default = 1417 MHz) # corestat -g Core Utilization for Integer pipeline Core,Int-pipe %Usr %Sys %Usr+Sys ------------- ----- ----- -------- 0,0 0.00 0.19 0.20 0,1 0.00 0.01 0.01 1,0 0.00 0.03 0.03 1,1 0.00 0.01 0.01 2,0 1.15 0.02 1.16 2,1 0.00 0.01 0.01 3,0 0.02 0.02 0.04 3,1 0.00 0.01 0.01 4,0 0.00 0.02 0.03 4,1 0.00 0.01 0.01 5,0 0.02 0.01 0.03 5,1 0.00 0.01 0.01 6,0 0.05 0.03 0.08 6,1 0.00 0.01 0.01 7,0 0.00 0.03 0.03 7,1 0.00 0.01 0.01 ------------- ----- ----- ------ Avg 0.08 0.03 0.10 FPU Utilization Core %Usr %Sys %Usr+Sys ------------- ----- ----- -------- 0 0.02 0.01 0.03 1 0.02 0.01 0.03 2 0.01 0.01 0.03 3 0.01 0.01 0.03 4 0.02 0.01 0.04 5 0.02 0.02 0.04 6 0.02 0.02 0.04 7 0.02 0.02 0.04 ------------- ----- ----- ------ Avg 0.02 0.02 0.04 As far as interpretation of corestat data is concerned, all the points mentioned in an earlier blog with respect to T1,
hold good. Since core saturation (measured using corestat) and virtual
CPU saturation (measured using vmstat/mpstat) are two different
aspects, we need to monitor both simultaneously in order to determine
whether an application is likely to saturate the core by using fewer
application threads. In such cases, increasing workload (e.g. by
increasing the number of threads) may not yield any more performance.
On the other hand, most often we will see applications having high
Cycles Per Instructions (CPI) and thereby not being able to saturate
the cores fully before achieving 100% CPU utilization. While I make this new version of corestat available here.. we are
already looking at a number of RFEs received as comments on my earlier
blog and via e-mails to me. Some of the points being considered. Stay
tuned !! Posted at 03:58PM Nov 19, 2007 by travi in Sun | Comments[10] |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||