Talend's new data processing engine on Sun Blade X6270
Having the chance to test the brand new Sun Blade X6270 server based on the Intel Xeon X5500 series processors, I asked one of our ISV partners, Talend, an open source ETL (Extract Transform & Load) solution provider, if they where willing to do some benchmarking with me.
The timing was perfect since Talend has just rewritten some parts of their ETL engine, that will be included in the upcoming version, in order to make a better use of modern CPU multi threading capabilities.
During the development they had benched their application on a two socket Xeon 5320, and where very interested in seeing how the the new Intel Xeon 5500 would perform.
Test descriptions
We used DBGEN v2.8.0, a database population program that generates files to be loaded in a database tables. In our case we will generate moderately to very large files, and will process them directly (no use of a database system) as simple flat files. Also, we will be only using the file called “lineitem.tbl” which represents a list of order item lines having the following structure:

For each benchmark run we perform three tests, each applying a different type of processing on the file:
-
Sort:
We will sort the entire file by date, on the 11th column (L_SHIPDATE: see above in red) -
Count:
Count the number or order lines by shipment mode ( L_SHIPMOD: see blue column above) and the year of the shipment date. ( L_SHIPDATE: see above in bold red ) -
Average:
Average discount (L_DISCOUNT) for each item (L_PARTKEY)
DBGEN uses a scaling factor representing the total size of all the tables generated. For this test we only use the file named «lineitem.tbl». The table bellow size and number of lines in the «lineitem.tbl» file given each scaling factor.
As you can see we start quite small, by processing a file with 6 million lines (only !) and go all the way to processing finally 3.3 Billion lines in a single file.
-
Scale
Number of entries
Size
1
6 Million
740 MB
10
60 Million
7,4 GB
100
600 Million
74 GB
300
1,8 Billion
225 GB
550
3,3 Billion
415 GB
Hardware Configurations
The following table shows the hardware configurations
used for the tests (referred to as X6270), and also the vanilla Xeon
bases box used by Talend (referred to as Bi-Xeon)
|
Server |
X6270 |
Bi-Xeon |
|
CPU |
2 x Xeon 5520 quad core with HyperThreading & Turbomode on (2,26GHz) |
2 x Intel Xeon 5320 quad core (1,86 GHz) |
|
RAM |
24 GB DDR III |
4 GB DDRII |
|
Internal storage |
1 x 136 GB 15K tr/min |
3 x 250 GB and 2 x 320 GB Seagate 7200 tr/min (all on ext3)
|
|
External storage |
Sun
StorageTek 2540 connected by Fiber Channel:
|
None |
|
Operating System |
Solaris 10 update 6 (aka. 10/08) |
Debian GNU/Linux Etch with Linux 2.6.18 (i686) |
With respect to the CPU, the X6270 configuration is obviously much more powerful, especiall given the amount of RAM, and the external storage. However the tests proved to be more CPU and IO bound than memory bound. Even if obviously the amount of memory does make a difference, the test will give us some indications about the extra performance brought by the Xeon 5500.
In order to get closer to the Bi-Xeon configuration, we did also two set of tests on the X6270: with (referred to as X6270-Ext) and without the external storage (Referred to as X6270-Int).
In the second case, we are even in a less favorable position than the Bi-Xeon that uses 3 disks vs. a single disk for the X6270.
Results
The table bellow presents the final results of the tests done on the three configurations. It's interesting to note a couple of things:
-
When processing a file, at least three times the disk space is needed to proceed. For this reason, we could only process a 7.4 GB file for the X6270-Int (Single internal 136 Gb in the server)
-
Given the much higher processing time needed on the Bi-Xeon, we didn't even try going further than 74 Gb.
-
We pushed the X6270-Ext up to processing a 415 GB file, and could have reasonably gone all the way to 1 Tb if we were not limited by disk space.
Conclusions
On the CPU bound tests (Average test) we can clearly see a 32% to 60% boost of performance on the new Intel Xeon 5500 compared to the older generation (depending on the size of the file).
Of course the processor matters, and we saw that on the more CPU bound processing, it has a great impact. But what we can also see, and that's not new, is that data hungry processors need to be fed with data, good and fast. To that respect the speed of the IO sub system is very important. Obviously working with files over 400 Gb put a lot of pressure on the IO, and plugging a professional external storage device, just makes a huge difference (in our case anyway)
As you can see on the SORT test (scale 10) we get a 290 % boost with the Intel Xeon 5500. Once we use the external storage, that performance sky rockets to 1075 % (more than 10x the performance) !
We could of course go on along time analyzing all the figures, with different file sizes, but without pushing the analysis very far, it's plain to see the performance gain we get with this new processor alone, not to mention if we also take care of the IO sub system.
The Intel Xeon 5500 based Sun servers, such as the Sun Blade X6270 we just tested, enhanced with an external storage device such as the Sun StorageTek 2540 seems to be a killer combination for large data processing.

We could of course go on along time analyzing all the figures, with different file sizes, but without pushing the analysis very far, it's plain to see the performance gain we get with this new processor alone, not to mention if we also take care of the IO sub system.
Posted by 高収入 on October 05, 2009 at 10:54 AM CEST #
Thanks for your information, i have read it, very good!
Posted by cheap ed hardy on November 02, 2009 at 01:28 AM CET #
Just wanna say thank you for the information you have shared. Just continue writing this kind of post. I will be your loyal reader.
Posted by ed hardy on December 01, 2009 at 06:34 AM CET #
You're mostly welcome! I'm always happy to share experiences with the community, and to see it is helpful. I'll definitely keep posting any new valuable information.
I also post on another blog if you like to check: http://blogs.sun.com/openomics
Regards
Amir
Posted by Amir Javanshir on December 01, 2009 at 04:03 PM CET #
thans for sharin
<a href="http://uags.net" title="Galatasaray">Galatasaray</a>
Posted by Galatasaray on December 10, 2009 at 10:57 PM CET #
Your presentation with Talend was impressive yesterday. You have done an excellent job.
Could you just explain what makes you think the external storage may improve the performance? The result has convinced me however I don't understand why.
Thanks for emailing me the answer!
Posted by Haitang on December 17, 2009 at 04:32 PM CET #
Hi Haitang and thank you for your very kind comment. It's nice to see that the Webinar we did with Talend was such a success.
As for your question about external storage and performance:
When you work on a small set of data it can all be loaded in RAM and processed there. However when dealing with very large data, as it's the case here, the speed data can be reached becomes very important. If you have a lousy IO subsystem the CPU will generaly spend most of it's time waiting for data to process. For example in the test case where we only used a single internal drive, you can imagine that while a process is writing on a temporary file, you can't read any data from the disk, thus the CPU is just sitting there waiting.
Even more, when reading, a single disk has only a very limited transfer rate. This is where usually RAID 0 (or stripping) comes in: If you combine several disks with RAID 0, then you will multiply the transfer rate and feed much more effectively your process.
If you take the example of the external storage we used in this case, we had 12 Hard drives (which is a quite small storage system by the way) that we divided into 3 pools of 4 stripped drives. One for input file (only reads), one for output files (only writes) and the last pool form the temporary files (read/write)
When reading the data, you now have 4 magnetic heads reading data instead of 1 ! Same for the writes. So you can easily see how this can speed the operations greatly.
On top of that (without going far in the details) a storage array is more than just a bunch of disks,but it comes with raid controllers, and fast memory that acts as a cache between the disks and the attached server.
If you are interested in the topic of IO bottleneck and how to monitor your system and detect them you would like to read a post I sent on another blog related to this issue:
http://blogs.sun.com/openomics/entry/squidsolutions_performance_bottleneck_database_io
If you need any more details I would be glad to provide you.
Posted by Amir Javanshir on December 18, 2009 at 11:19 AM CET #
Hi Amir, thanks a lot for the answer and the link of another exemple.
You have confirmed me that it's not the additional storage space who has influenced the performance. It's, somehow, a parallel execution of 4 hard drives (in the reading case) who makes the difference. This implicates the problem was linked to the I/O bottelneck.
I'm really interested in the these articles and I'd love to have more details.
By the way, I wrote you a mail which describes the question I face and I believe that you can surely give me a hand someday.
Thank you again.
Posted by Haitang on December 18, 2009 at 05:21 PM CET #