Friday October 09, 2009
New SunStudio Screencast on Improving Performance of Parallel Codes
Cool new video from Darryl just showed up on
Sun's HPC site.
In this video, Sun Studio expert,
Darryl Gove, shows how you can use Sun Studio Performance Analyzer to improve performance of a parallel application. Darryl uses the Mandelbrot set application to highlight the features. This screencast is also one of the demos we will run at Oracle OpenWorld that I mentioned in my
previous blog.
Take about 15 mins to view it. You will learn something about OpenMP, parallelization and even Mandelbrot sets.
What I saw at IDF09
After a short break, I'm back in blogland. In the meantime, I and my team
have moved back from Sun's Cloud Computing Engineering organization to
Sun Studio (Compilers and Tools). It was a wonderful ride and I learned
so many things that I intend to build on, in coming months. Of course,
my group is still involved in the same Cloud-related tools project of
making HW, SW stack and tools more easily accessible to developers who
dont have OpenSolaris (or Solaris) on their desktop and may not have
access to a SPARC machine in their development group. More on that in
the coming weeks, but right now I'll turn to Sun Studio related
activities.
This was the week for Intel
Developer Forum 09 (Sept 22-24, Moscone West, San Francisco).
Last year, the emphasis seemed to be on Nehalem, AVX, Graphics and
Parallelism.
This year, the emphasis seems to be around Mobility, some followup on
Parallelism and Cloud Computing. Intel is totally on top of the world
with the Nehalem chip: a well-balanced, high performance chip with a
great feature set that the company can build their entire roadmap on.
They are on a high, and know they have a winner in Nehalem.
This year again, we
were invited to have a booth and a Chalk Talk at the conference.
The booth duty was interesting and you really get to do some deep-dive
type conversations with some interesting folks who walk by and we got
our share, this year as well. Which makes it all worthwhile and
stimulating. Its an ideal time to listen to what other developers have
to say about our products (both good and bad and we heard both sides)
and to share views on where the environment is headed. If you remember,
I gave a Chalk Talk last year as well. This year's talk was
in our own booth, so it was more lightly attended but it was fun (and
chaotic) as well. My focus was on Compiler performance and the new
World Records we have created since the launch of Nehalem systems (get
details here: http://www.sun.com/benchmarks/software/index.jsp
and look for the Sun Studio logo), on new features (OpenMP 3.0, SSSE3,
SSE4.1, SSE4.2, ), new parallelization assistance tools (DBXtool, MPI
analyzer, Profiling D-trace like with D-light and DTrace GUI),
ease-of-development with a fully-integrated IDE (based on NetBeans 6.5
with considerably enhanced C/C++ support) and continuous ongoing
improvements (lots of improvements on the performance side, with better
vectorizer, register allocator, instruction scheduler, etc, an improved
Performance Analyzer and Thread Analyzer with support for new HW
counters and too many to describe here in details). Look up here for more details.
Intel itself build IDF as a showcase for next, next, next generation of
technologies. What was truly interesting was how much focus there was
on Cloud Computing. They had two dedicated 3-day track on this (one for
Public Cloud and another for Enterprise Cloud), but more than that, it
was interspersed at many of the other talks as well. The emphasis was
clearly on educating on technologies they provide to enhance
Datacenters:
How does Sun Studio stack up against GCC on Nehalem
This is one of the FAQs on the compiler front that I constantly get at TechDays, at customer meets, etc. I often point to various benchmarks that Sun Studio has won (and that a World Record means this compiler beats every other compiler in the business and that a system configured this way: with specified HW, OS and Compiler levels is the best in performance that you can currently get today.
I have devoted a few blogs to that effect as well in the past.
Two team members of the Sun Studio organization, John Henning (our SPEC rep, really) and Karsten have now written a paper comparing Sun Studio and GCC on Nehalem systems. Its a must read if you've struggled with this issue in the past.
You can find it here and you can post your comments or ask questions at the page as well.
Of course, as has been variously argued in the past (
see this thread , eg ), SPEC doesnt always give the full picture. However, IMO, its a good standby for what you can get out of a compiler. The suite is a broad set of industry-accepted applications that represent significant market segments in themselves. Tuning and extracting good performance isnt just a matter of turning some compiler switches on. You can get good gains by analyzing applications carefully and using compiler tunings to improve their performance effectiveness, which is generally what happens in the case of SPEC applications.
Posted by tatkar
( Jun 30 2009, 10:16:22 AM PDT )
Permalink
Comments [0]
Dozen new World Records with Sun Studio + OpenSolaris at Nehalem Launch
Sun today
launched x64 systems based on Intel Nehalem chip (aka Xeon Processor
5500 series) in grand style! Sun is calling these Open Network
Systems to emphasize that its about much than just a chip upgrade
alone. In particular, the message is around system design and
innovation that encompasses " the convergence
of open compute, storage, networking and software to deliver best
application performance, simplicity and savings".
Application performance (and benchmarking performance) is best measured
by what Sun highlighted today. Closer to (my)home, its a hit out of
this park. Consider what the combination of Sun Studio 12 update 1
and OpenSolaris has cooked
up:
3 new Sun Studio World Records with AMD quad-core Sun x64 servers
Sun announced today
enhancements to the Sun Fire(TM) x64 servers and Sun Blade(TM) systems
lineup.
The new lineup includes a newer rev of the quad-core Opteron chips, the so-called Shanghai processors.
The new world records are in the area of SPECint_rate, SPECompL2001 and
SPECompM2001
Sun Studio Express 11/08 + Shanghai server module = new SPEComp World Record
Sun's newest blade, Sun Blade X6440, server module powered by the latest Quad-Core AMD Opteron processors code-named "Shanghai" posted the best x86 16-thread result on the prominent HPC SPECompM2001 benchmark with the newly released
Sun Studio Express 11/08 compilers on
OpenSolaris 2008.05. SPEComp is often used as a barometer of performance for shared memory compute-intensive scientific applications, so this is a great announcement during the ongoing Supercomputing 2008.conference.
The Blade server module was announced as part of Sun's Technology Demos highlighting a new Open Petascale computing environment called
the Sun Constellation.
Required Disclosure: SPEC and SPEComp are registered trademarks of the Standard Performance Evaluation Corporation.
Results from http://www.spec.org and this announcement as of 11/16/2008.
Sun Blade X6440 (4 x AMD Opteron 8384 chips, 16 cores, 4 cores/chip, 16 threads) SPECompM2001 - 35,896.
Posted by tatkar
( Nov 18 2008, 05:33:11 PM PST )
Permalink
Comments [0]
SunStudio World Records on new Sun T5440 Server
Sun today announced a new revision to the CMT (Chip MultiThreading)
lineup: Sun SPARC Enterprise T5440 servers.
These provide an astonishing 256 threads in a 4U box. A short while
back, 256 HW threads would have been a BIG datacenter machine; these
are now in a rackable 7-inch (4 Rack Unit)container! In this world of
increasing parallelism, this is Sun's next and to-date most impressive
step. All the while, as I talk about how much attention we need to pay
to parallelism as a way of getting applications to exploit the full
capabilities of HW, the systems guy have been flooding the market with
4- or more cores per chip.
Eight
new World Records were announced with these new servers. Of these
three world records were with Sun Studio 12:
IDF Chalktalk and two new World Records with Sun Studio
IDF Chalktalk went nicely yesterday. There was some decent interest in Sun Studio. This being my first chalktalk, I was a bit wary of what the expectations around it were. I didnt have to worry: I had plenty of support from well-wishers and colleagues who dropped in for moral support.
It helped, of course, that I could start the talk by announcing
two new World Records with Sun Studio compilers.
Sun SPARC Enterprise Server M9000 and Sun Studio Break 2TFlop barrier
Sun's
recently announced SPARC Enterprise Server series of machines has
run off with a string of World Record Benchmarks (see SPECfp
2006 records here, SAP-SD
2-tier records here and outrunning
Power6 comparable systems here). They also excel on the SPECompM2001 benchmark with
new World Records (here).
One of the most important benchmarks in HPC marketplace is the Linpack
benchmark (which solves a dense
system of linear equation, allowing the user to scale
the size of the problem and to optimize the software in order to
achieve the best performance of a given machine). Linpack
results are carefully tabulated (here) and used
in buying decisions. Sun has achieved a 2+ TeraFLOPS
performance, handily beating the nearest rivals including IBM's Power6
systems and HP's Itanium-based Integrity Superdomes by a 2x and 3x
margin. Sun's
results are announced here. Its great to see Sun back in this
market with a strong SPARC processor presence.
This benchmark is a highly parallel test designed to measure how fast a
computer system can solve linear equations, a common task in
engineering and scientific applications. Subsequently, Sun Studio's ability to
optimize is absolutely critical to this benchmark and to the HPC market.
Its a great vindication of the tireless efforts of many engineers to
achieve this end result. And it is a testimony to the excellent relationship
between Sun and Fujitsu that Sun could pull this feat off based
on the latest Fujitsu SPARC64 VII quad-core processors!
Way to go! May this be a key entry point for Sun's increasing presence
in the HPC market.
Posted by tatkar
( Jul 16 2008, 05:21:27 PM PDT )
Permalink
Comments [0]
Sun Studio preso at annual OSDevCon
The annual OpenSolaris Developer Conference (OSDevCon) was held
this year in the heart of the Czech republic- Prague, between June 25 -
27th (2008). The first day of tutorials included a free Tutorial by
our own Roman Shaposhnick on "OpenSolaris- an Ultimate Development Platform
?" Roman talked about Sun Studio of course and how Sun Studio and
OpenSolaris collaborate to form a stable development platform. Heres a
video of his tutorial/presentation( a complete 1hr+ preso)
Roman had two guest presenters: Adrian De Groot of KDE and Dennis
Chernaivanov from Docarema. KDE uses Sun Studio for development and is
very happy with the state of the tool (I
had blogged about it earlier here). Docarema uses both Linux and
Solaris servers as their backend
Dave
Stewart, our Intel partner guy, was also there and in this blog
comments on his take on Roman's presentation here. Dave provides an
interesting summary of events related to Roman's tutorial and the views
of the KDE and t-Bricks developers on the strengths and concerns
about using Sun Studio for development.
Seems like it was a successful conference all around!
Posted by tatkar
( Jun 30 2008, 01:30:08 PM PDT )
Permalink
Comments [0]
Sun HPC ClusterTools 8.0 EA2 now available (now on Linux too)
Sun
HPC ClusterTools 8 Early Access 2 is a pre-release version of OpenMPI 1.3 with the latest bug
fixes and some new features.
CT8 EA2 is the first release to support Linux.
New features in CT8 include:
Sun unleashes Quad-core Barcelona systems
And not a day too soon, either!
To quote Sun's press (I dont think I'm capable of writing such long, flowery and yet wonderfully descriptive sentences! :-)
Sun Microsystems, Inc. (NASDAQ: JAVA) today announced the availability of its first Sun Fire and Sun Blade systems powered by Quad-Core AMD Opteron processors, bringing new capabilities, increased performance and expanded scalability to customers that purchase or upgrade to these quad-core systems. The Sun Fire X4140, Sun Fire X4240 and Sun Fire X4440 servers, the newest systems to join Sun's extensive x64 (x86, 64-bit) server line, give customers industry-leading energy efficiency, density and scalability powered by Quad-Core AMD Opteron processors and a choice of operating systems, including the Solaris 10 Operating System (OS), OpenSolaris operating system, Linux, Windows and VMware.
And as a footnote, Sun Studio 12 (with patches) is fully optimized for it (use -xtarget=barcelona switch in addition to the usual switches, to get better instruction selection, esp. for FP-style code). So far, feedback on this mode of code generation has been very positive. I had described these changes in a much earlier blog (as blog timelines go!)
here with over 30% improvement on SPECfp and smaller changes on SPECint programs. The volume deployment on systems using Barcelona has been ...er long awaited. Its great to see AMD back in the game; I'm sure this will begin to take the quad-core performance battle with Intel to the next level.
Posted by tatkar
( May 13 2008, 10:28:26 AM PDT )
Permalink
Comments [0]
Try NetBeans 6.1
NetBeans 6.1 has been out for about 2 weeks now, tho I have been remiss
to mention it here.
NetBeans 6.1 is a minor-version update, but contains some significant
changes, esp for C/C++ users. Among them:
JVM now compiles with Sun Studio on Linux
Yep, you heard it right.
The OpenJDK team has pulled off yet another fantastic feat here:
http://hg.openjdk.java.net/jdk7/jdk7/hotspot/rev/485d403e94e1.
Serguei Spitsyn integrated recently into this changeset.
So, should we get excited about
compiling OpenJDK/HotSpot? Doesnt Sun Studio on Linux already compile a
bunch of industrial scale applications?
Indeed we should. Some reasons why:
Sun Releases Niagara 2 (UltraSPARC T2) with throughput record performance
Sun has
introduced server and blade systems based on much-awaited Niagara2 chip (aka, N2,
also officially called the UltraSPARC T2).
There were three new models announced: Sun SPARC
Enterprise T5120 and Sun SPARC
Enterprise T5220 servers and the Sun Blade T6320 server
module.
These contain upto 8 cores (available in 4-core and 6-core as well) and
8 threads/core, for upto 64 threads per chip (wasnt that like a full fledged datacenter
machine, not a long time back?). N2 makes up for some
deficiencies in the original Niagara design with the addition of FP
unit per core and 1 Crypto unit per core.
Best of all,
(IMO), the performance numbers on SPECint_rate2006 and
SPECfp_rate2006 leave the competition in the dust!
Check out this
performance reference from Sun's official announcement. It clearly
makes this chip the king of computing capacity!
All of this is of course enabled by Sun Studio 12 compilers,
which played a key role in making the World Records a reality.
Cool, isnt it? I think so...
Posted by tatkar
( Oct 10 2007, 11:20:09 AM PDT )
Permalink
Comments [0]