Ravindra TalashikarRavindra Talashikar's weblog |
|
Wednesday Dec 07, 2005
Database scaling on Sun Fire T2000
Database scalability on Sun Fire T2000With the launch of Sun Fire T2000
server I would like to share with you three aspects of database
performance on these CoolThreads Servers. First let us see what
reaserachers have found about CPU utilization of DBMSs, then we will
see what we observed from our tests and finally I will share my
analysis.
1. Where does time go ? Researchers from University of Wisconsin in their analysis of DBMS performance of modern processors have looked at the CPU utilization of DBMS by analyzing where does time go ? They have analyzed how does processor cycles get utilized. The conclusion clearly states that for OLTP workloads 60% to 80% of the time is spent in memory related stalls. Also memory stalls breakdown shows dominance of data and instruction stalls at L2 cache level. OLTP workloads due to the higher amount of memory stalls exhibit high CPI (Cycles per Instruction). As the stalls increase, processor core utilization reduces and it lowers the overall efficiency. UltraSPARC T1 processor with CoolThreads technology is fundamentally designed to take advantage of the stall component in the workload. UltraSPARC T1 hides memory stalls in one thread by allowing other threads from the same core, to use the pipeline. Where a thread on a conventional processor would stall and still occupy the pipeline, UltraSPARC T1 has hardware threads which can continue to execute even if one or more threads are stalled. This results in greatly improving the core efficiency. 2. What did we observe ? Soon after the arrival of early prototypes of Sun Fire servers based on UltraSPARC T1 processor we were curious to know how CMT works for database, how does shared L2 cache behave for OLTP and how does commercial databases benefit from all the large page performance projects in Solaris. So, we configured about 1.5 TB of database using a commercial DBMS on Sun Fire T2000 with 32 GB memory and did a number of performance tests. Let us see what we found : Scaling characteristics : Initially by sizing the database scale we controlled the amount of i/o activity to simply understand the scaling with increasing hardware threads. We noticed excellent scaling :
* We observed ~10% idle time for this config As shown above, this commercial DBMS
could scale
quite
well in both dimensions. Due to the high amount of i/o we saw
idle time at 32 threads.
There are two ways in which we can
select hardware threads from cores of UltraSPARC T1. e.g. if
we want to use 8 hardware threads, we can choose 4 threads in 2
cores or can use 1 thread in each core.
Comparison of the throughput results show that for the same number of hardware threads its beneficial to use more number of cores. The performance gap gets closed as we increase the number of threads per core.
This shows that for the same number of hardware threads DBMS performance benefits from using more cores. Certain resources like Level 1 caches and TLBs are available per core and using more number of cores allows the software to use these resources. However, around 24 threads, the difference between choosing all 8 cores over selecting only 6 cores almost vanishes. DBMS and large page support in Solaris : We also did characterization of large pagesize selection features in Solaris 10, specifically developed for UltraSPARC T1. UltraSPARC T1 processor has
a 64 entry Instruction and Data TLB per core which supports 8k,
64k, 4M and 256 M pagesizes. Solaris 10 kernel on Sun Fire T2000
has been optimized to make use of large pages for various segments in
the address space of a process. Solaris provides optimum pagesize
selection algorithm out of the box and requires no special
tuning.
Individual feature tests showed following results :
We have seen OLTP performance
improvements upto
30% due to combined effects of all large page projects in
Solaris. While running commercial DBMS on Sun Fire T2000, we see most
of
the database cache being allocated using 256 MB pages, text getting
allocated on 4 MB pages with heap, stack and anonymous memory
segments getting allocated using 64 KB pages.
All of this works just out of the box ! Other observations :
A number of factors contribute to overall good performance and scaling. Basically CoolThreads technology is really working well. [We have validated this by analyzing hardware performance counter data collected using cpustat]
Posted at 11:15AM Dec 07, 2005 by travi in Sun | Comments[19] |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Posted by Igor on December 07, 2005 at 07:48 PM IST #
Posted by Bernd Eckenfels on December 11, 2005 at 09:11 AM IST #
Posted by Ravindra Talashikar on December 12, 2005 at 07:41 PM IST #
Posted by Ravindra Talashikar on December 12, 2005 at 07:45 PM IST #
Posted by Mike on December 14, 2005 at 05:31 AM IST #
Posted by Heinz-Josef Wrobel on January 19, 2006 at 10:19 PM IST #
Posted by Tom Zurita on May 23, 2006 at 11:37 PM IST #
Posted by Ravindra Talashikar on June 09, 2006 at 04:57 PM IST #
Posted by Christian on June 16, 2006 at 02:43 PM IST #
Posted by Deepak Jaisingh on August 26, 2006 at 05:36 AM IST #
Posted by Bernd Eckenfels on August 26, 2006 at 11:02 PM IST #
Posted by Alan Wilson on September 08, 2006 at 09:39 PM IST #
Posted by 85.54.137.26 on February 18, 2007 at 05:07 PM IST #
Posted by dean leong on March 15, 2007 at 07:48 AM IST #
Posted by Gowrishankar on May 10, 2007 at 05:03 PM IST #
Posted by yumianfeilong on June 20, 2007 at 02:17 PM IST #
We're trying to port an Oracle9i based OLTP database from a V1280 ( 12 single core CPUs ) to a T2000 and are seeing terrible results.
Suggestions to turn parallelism on hasn't helped since this is not a DW or DSS architecture. In fact, a majority of the app runs slower.
Are there any references of sucessful porting to the T2000?
Posted by Ramon Martinez on October 20, 2007 at 05:29 AM IST #
For Oracle 9i or 10g r2 on SUN T-2000, does
set consistent_coloring=2
required to be set on kernel to take advantage of the high-speed memory L2 cache? Your help is much appreciated.
Posted by Raghu on November 09, 2007 at 11:34 PM IST #
I too am seeing horrible performance. 2xT2000 8 core with Sun 3510. Oracle 10gR2. Same setup with 2xv240 is 4 times faster for the exact same database. Does anyone have any tricks for speeding up the t2000?
Posted by Todd Helfter on April 03, 2008 at 12:42 AM IST #