Good Scaling on Sun SPARC Enterprise T5220
Web based applications, in particular Java web-based applications,
should scale well on multiple processors / machines. If they don't,
it would be impossible for the service to grow. And if they do, they
should scale well on Sun's CMT servers.
A recent example I have seen is of Trivnet's multi-service mobile payments platform, TRIV Platform™ version 3.1.1, which has recently been benchmarked on an 8 cores (64 virtual processors, 1.2GHz, 32GB RAM) Sun SPARC Enterprise T5220 server. This power efficient single CPU machine has successfully handled 520 transactions per second (TPS), which was high above the expectations, but the main point is the way it scales.
Good scaling, with the ultimate target of linear scalability, means that in order to handle more load, you proportionally increase the utilization of the computing resources. When it is a 1:1 proportion, it is a perfect linear scalability. The T5220 has 64 virtual CPUs, so the key for gaining high throughput is good parallelism between application threads (or processes). Good parallelism is directly interpreted to being able to utilize all virtual processors.
TRIV 3.1.1 has been deployed on two application server (Weblogic 10.3) instances, running on the T5220 machine. Java 1.6.0_11 was used on Solaris 10 u5. The main significant tuning was using libumem as the allocation library, which allows thread parallel (native, not Java) memory allocation – the performance (throughput) gain was around 15%. The other one was increasing the young generation Java heap (NewRatio=2), as most of the allocations are very short lived, and full garbage collection is rarely needed. There is a significant improvement in GC performance (and parallelism) in Java 1.6.0.
See the graph below. As we keep increasing the load, more CPU is consumed, not far from linear growth. This is what we want to see when examining application scalability. This is not trivial, as internal locking might increase, but again – web applications should scale.

A typical virtual processors load distribution looks like the following:

We stopped at 520 TPS, when the loader machines became the bottleneck. And, remember that if needed (was not needed here), we can always split the machine into Solaris containers or LDOMs when horizontal scaling is preferred.
Thx.
Posted by muhabbet on March 16, 2009 at 12:26 AM IST #