I thought I would share some details from a performance escalation I had a little while back, as it has since done the rounds on a few internal mail aliases and I think illustrates some useful points about the coolthreads servers (specifically the T2000, but still relevant to the others). This case revolves around comparitive performance between a Sun T2000 server and the customer's existing V490 test system, but I have handled and assisted on several other similar cases involving comparisons with both Sun and non-Sun servers (such as Xeon based boxes).
The problem:
The customer was preparing to deploy a large in-house developed Java application onto a set of 12 x T2000 servers running BEA WebLogic and were starting to run some load tests.
What they found from their testing, was that they were seeing a total transaction time of around 1.5s on the T2000 compared to 0.5s on their initial testbed V490 server. As components were added into the application layers, this transaction time then went up to an average of 6.6s - which was considered as a show-stopper by the business.
Some initial analysis and discussion with the application developers revealed that the testing was being done with a single-threaded load, which was not representative of the end solution, but they believed that if the server could not cope with a sequential load then scaling it up would make things worse - which initially seems like a very reasonable conclusion (and was pushing the customer into a distinct panic mode).
After some further discussions, the customer agreed to run some tests for me to prove the point I was trying to make about these servers being designed for parallel scalability rather than single-threaded horsepower. Their developers quickly put together some simple code that would run on their same software stack and would create 100 million Java objects, using either 1, 10 or 100 threads to handle the work. Here are the timing results from performing this test on the older V490 platform and a newer T2000 server.
as you can see from these simple results, a single-threaded / sequential load will indeed perform somewhat slower on a T2000 server compared to a similar non-coolthreads system. For a single thread (in this test) the T2000 was 1.6x slower, but for 100 threads (and the same overall amount of work) the T2000 was 10x faster!
We often get calls from customers or partners that they are seeing slower performance from a particular program on a coolthreads (T1/T2 equipped) server, especially when compared to something like a competitor Xeon-based solution. For some limited single-threaded applications this is indeed the case, but try running a hundred (or a thousand!) copies of it at the same time and see what happens...the Xeon server will top out after the first few, but the T1/T2 just keeps on going.
The problem:
The customer was preparing to deploy a large in-house developed Java application onto a set of 12 x T2000 servers running BEA WebLogic and were starting to run some load tests.
What they found from their testing, was that they were seeing a total transaction time of around 1.5s on the T2000 compared to 0.5s on their initial testbed V490 server. As components were added into the application layers, this transaction time then went up to an average of 6.6s - which was considered as a show-stopper by the business.
Some initial analysis and discussion with the application developers revealed that the testing was being done with a single-threaded load, which was not representative of the end solution, but they believed that if the server could not cope with a sequential load then scaling it up would make things worse - which initially seems like a very reasonable conclusion (and was pushing the customer into a distinct panic mode).
After some further discussions, the customer agreed to run some tests for me to prove the point I was trying to make about these servers being designed for parallel scalability rather than single-threaded horsepower. Their developers quickly put together some simple code that would run on their same software stack and would create 100 million Java objects, using either 1, 10 or 100 threads to handle the work. Here are the timing results from performing this test on the older V490 platform and a newer T2000 server.
Task Description V490 Time T2000 Time Create 100 million objects sequentially 400s 663s Create 10 million objects/thread with 10 threads 406s 87s Create 1 million objects/thread with 100 threads 404s 41s
as you can see from these simple results, a single-threaded / sequential load will indeed perform somewhat slower on a T2000 server compared to a similar non-coolthreads system. For a single thread (in this test) the T2000 was 1.6x slower, but for 100 threads (and the same overall amount of work) the T2000 was 10x faster!
We often get calls from customers or partners that they are seeing slower performance from a particular program on a coolthreads (T1/T2 equipped) server, especially when compared to something like a competitor Xeon-based solution. For some limited single-threaded applications this is indeed the case, but try running a hundred (or a thousand!) copies of it at the same time and see what happens...the Xeon server will top out after the first few, but the T1/T2 just keeps on going.
