Thursday May 14, 2009

Using and customizing the db_STRESS tool to stress test a MySQL database.[Read More]

Friday Apr 24, 2009

Having the chance to test the brand new Sun Blade X6270 server based on the Intel Xeon X5500 series processors, I asked one of our ISV partners, Talend, an open source ETL (Extract Transform & Load) solution provider, if they where willing to do some benchmarking with me.

The timing was perfect since Talend has just rewritten some parts of their ETL engine, that will be included in the upcoming version, in order to make a better use of modern CPU multi threading capabilities.

During the development they had benched their application on a two socket Xeon 5320, and where very interested in seeing how the the new Intel Xeon 5500 would perform.

Test descriptions

We used DBGEN v2.8.0, a database population program that generates files to be loaded in a database tables. In our case we will generate moderately to very large files, and will process them directly (no use of a database system) as simple flat files. Also, we will be only using the file called “lineitem.tbl” which represents a list of order item lines having the following structure:



For each benchmark run we perform three tests, each applying a different type of processing on the file:

  • Sort:
    We will sort the entire file by date, on the 11th column (L_SHIPDATE: see above in red)

  • Count:
    Count the number or order lines by shipment mode ( L_SHIPMOD: see blue column above) and the year of the shipment date. ( L_SHIPDATE: see above in bold red )

  • Average:
    Average discount (L_DISCOUNT) for each item (L_PARTKEY)

DBGEN uses a scaling factor representing the total size of all the tables generated. For this test we only use the file named «lineitem.tbl». The table bellow size and number of lines in the «lineitem.tbl» file given each scaling factor.

As you can see we start quite small, by processing a file with 6 million lines (only !) and go all the way to processing finally 3.3 Billion lines in a single file.



Scale

Number of entries

Size

1

6 Million

740 MB

10

60 Million

7,4 GB

100

600 Million

74 GB

300

1,8 Billion

225 GB

550

3,3 Billion

415 GB


Hardware Configurations

The following table shows the hardware configurations used for the tests (referred to as X6270), and also the vanilla Xeon bases box used by Talend (referred to as Bi-Xeon)

Server

X6270

Bi-Xeon

CPU

2 x Xeon 5520 quad core with HyperThreading & Turbomode on (2,26GHz)

2 x Intel Xeon 5320 quad core (1,86 GHz)

RAM

24 GB DDR III

    4 GB DDRII

Internal storage

1 x 136 GB 15K tr/min

3 x 250 GB and 2 x 320 GB Seagate 7200 tr/min (all on ext3)

  • 1 x 250 GB for system and temporary files

  • 1 x 320 GB for input files

  • 1 x 320 GB for output files

External storage

  • 3 volumes of 4 disks using RAID 0 (stripping), 544 Gb each.

  • A ZFS pool for each group.

None

Operating System

Solaris 10 update 6 (aka. 10/08)

Debian GNU/Linux Etch with Linux 2.6.18 (i686)


With respect to the CPU, the X6270 configuration is obviously much more powerful, especiall given the amount of RAM, and the external storage. However the tests proved to be more CPU and IO bound than memory bound. Even if obviously the amount of memory does make a difference, the test will give us some indications about the extra performance brought by the Xeon 5500.

In order to get closer to the Bi-Xeon configuration, we did also two set of tests on the X6270: with (referred to as X6270-Ext) and without the external storage (Referred to as X6270-Int).

In the second case, we are even in a less favorable position than the Bi-Xeon that uses 3 disks vs. a single disk for the X6270.

Results

The table bellow presents the final results of the tests done on the three configurations. It's interesting to note a couple of things:

  • When processing a file, at least three times the disk space is needed to proceed. For this reason, we could only process a 7.4 GB file for the X6270-Int (Single internal 136 Gb in the server)

  • Given the much higher processing time needed on the Bi-Xeon, we didn't even try going further than 74 Gb.

  • We pushed the X6270-Ext up to processing a 415 GB file, and could have reasonably gone all the way to 1 Tb if we were not limited by disk space.

Results table

Conclusions

On the CPU bound tests (Average test) we can clearly see a 32% to 60% boost of performance on the new Intel Xeon 5500 compared to the older generation (depending on the size of the file).

Of course the processor matters, and we saw that on the more CPU bound processing, it has a great impact. But what we can also see, and that's not new, is that data hungry processors need to be fed with data, good and fast. To that respect the speed of the IO sub system is very important. Obviously working with files over 400 Gb put a lot of pressure on the IO, and plugging a professional external storage device, just makes a huge difference (in our case anyway)

As you can see on the SORT test (scale 10) we get a 290 % boost with the Intel Xeon 5500. Once we use the external storage, that performance sky rockets to 1075 % (more than 10x the performance) !

We could of course go on along time analyzing all the figures, with different file sizes, but without pushing the analysis very far, it's plain to see the performance gain we get with this new processor alone, not to mention if we also take care of the IO sub system.

The Intel Xeon 5500 based Sun servers, such as the Sun Blade X6270 we just tested, enhanced with an external storage device such as the Sun StorageTek 2540 seems to be a killer combination for large data processing.

Thursday Apr 09, 2009

I'm rather happy living in Paris, feeling that I'm in a big city where you can find something interesting to do every night. This said I hardly even know what's going on in town, stuck between my job the family life, and my musical night life. I could might as well live in the country side...

But thanks to the giant billboards in the metro I found out that in the following weeks, the late 70's and early 80's are striking back, and hard !

Steely Dan, Foreigner, White Snake, Roger Hudgson (Ex. Supertramp), Lynyrd Skynyrd and Marillion are all on tour, and coming to the so called “city of lights”. Gee... I feel like I'm 10 again.

Let's start in order: Steely Dan

What can I say ? Steely Dan is for me the best you can get ! These guys are the quintessential blend of Rock and Jazz, all served with the best lyrics I've ever heard (and never understood !) That's part of the charm: every one has it's own interpretation of Steely Dan's lyrics. By the way Donlad Fagen and Walter Becker never “explain” their lyrics. Why Should they ? Do magicians give away their tricks ?

Although a long time “fan” (I hate that word, but can' find a better one right away) I just managed to see them once in concert in 2007 at “Le Grand Rex” in Paris, and boy was that concert magic. Needless to say that I was not disappointed. I have hardly seen a band play so tight and with such a killer groove. Check this one:

For many, Steely Dans sound is too slick and too “clean”, some would say over produced. I would say perfectly produced.

Ok; if you're looking for a more “roots” sound, the Dan is not for you. But if you're looking for gems, where songwriting, lyrics, performing and recording go beyond perfection, then run and grab you're ticket for their next concert. I already have mine, and just cant wail until July the 2nd.

By the way look at the URL ov this blog: aja. It's a contraction of Amir Javanshir, but of course it's also one of Steely Dan's best album.

Foreigner

Boy. Foreigner. Here again I here a lot of criticism, saying that Foreigner was just a commercial hard rock band, and that the real thing was Led Zeppelin. Maybe. But I can't help it: I never really appreciated much Led Zep, but fell in love with Foreigner. I was 10 at that time, so blame it on my older brother who didn't have Led Zep in his record collection.

It's somehow like a Pavlovian reflex: When I hear the first measures of “Juke box hero”, although it starts real soft and cool, I already know that in 30 seconds, when the distorted guitars and heavy drums will kick in, I'll be jumping all around the place like crazy, playing the air guitar, just like I did when I was 10. Any cure for this ?


However I'm not sure I'll go to see them in concert: Today's Foreigner looks more like a cover band than the real thing. Ok, you still have Mick Jones, in the band, but that's pretty much all. And you have guessed, what is really missing is Lou Gramms golden voice.

Several years ago, I discovered a guy, on a german TV show, singing (or trying to sing) a Foreigner song. I said to myself: boy the guy is just killign the song, he can't sing ! Who is this gut anyway ? It was a real shock for me to see at the end of the “performance” that it was Lou Gramm himself. I hadn't recognized him, because he ha put on a lot of weight, and the voice...it was simply not Lou's voice. I then found out that Lou had been seriously sick was diagnosed with a type of brain tumor. Somehow the medical treatment had ruined his voice, and I'm not sure if it will ever recover. This was really a heartbreaking news.

All this to say, I'm not sure I'm prepared to go and see Foreigner on stage, without that mythical voice. I'd rather listen to my old records.

White snake

I must admit I really prefer the pre-hair metal period, even though I like to indulge myself watching their mid 80's video clips once in a while (thanks VH1 Classics)

Remember, the big (huge) hair, the flashy guitars, the white Jaguar and the sexy super model purring like a cat on the hood of the same Jaguar ?

All this was way to outrageous but boys was that fun to watch and to listen to also. At the end of the day say what you want, Dave Coverdale can sing.
I'm running to get by ticket for this show.

From the pre-hair metal period:


Last but not least: Roger Hudgson

Supertramp, was, is, and will always be one of my (if not my) favorite band ever. If I was to go on a desert island, “Crime of the century” will surely be going with me.


For some reason I never manage to see Roger in concert ! Each time something goes wrong (the last time he was playing at the “Fete de l'Humanité” outside Paris, and I never managed to find the freaking place (Ok I got off the train one stop early !) Another time he was playing in Cannes the same day I was leaving Cannes for Paris. And this time around looks as if I won't be able to go and see him. Looks like it just can't be. Right ? It's bloody well right !

Monday Apr 06, 2009

Wow ! Two blog posts in the same day. Am I going crazy or what ? Spring time has sure a strange effect on me, or could it be I'm just having a break from typing the various reports for my ongoing projects ? (Like many IT guys, I love doing the projects, but hate typing the reports...)

Anyway just a note to say that I finally gave myself a kick in the back and set up my own web site dedicated to my music. What you should know is that after my regular day job, at night when everyone else is sleeping, a second day starts for me from 11 PM up to... well until I drop !

I have set up a recording home studio, where I produce my own songs. Not mentioning the song writing process itself (music and lyrics), you wouldn't believe the time it takes to get a final song: record each instrument one after another, program what needs to be programmed (usually the drums and various synths, and trying to make it sound as if...it was not programmed), sing (re-sing because the first take was crap, re-re-sing because you missed the high note, re-re-re-sing because....), find all those little instruments and arrangements that could lift up the song, etc.

But once every piece is in the box, the real trouble begins. Mixing the whole thing. For those who are not familiar with this process, all I can say is that it's pretty much like hell ! No wonder Sound engineering is a real job.

Not having the high end material, nor the right acoustic environment (after all, it's all done in a corner of the living room !), nor, the most important, the skills and ears of a real sound engineer is sometimes frustrating. However it's really a lot of fun to do all this, and when you end up having a entire song, that sounds half decent, well it's a big relief.

All that said, you are all welcome to go to my web site, listen online or download the songs. Any feedback would be mostly appreciated, even if you don't like it. Because at the end of the day, even if the songs are not perfect, the sound is not perfect, the performance is not perfect, a song that is not shared is pretty much useless.

Often the simplest explanations are the best.

Not long ago one of the ISVs (independent Software Vendor) I work with called me concerning a problem one of their customers had encountered on Solaris. The ISVs software was complaining that it didn't find a shared library called libpool.so. This library is part of the SUNWPool package.

Now let the Solaris administration festival begin !

First thing to do: Check if the package is there or not using pkginfo -l SUNWpool

I checked on my personal workstation running the exact same version of Solaris 10 than the customer, and found the library. But the customer didn't.

There are only two solutions here: Or the package has never been installed or it has been removed at one point. Of course I got the classical answer: “Solaris was installed in a normal way so why isn't this package there ?”

Well the answer is quite simple: When installing Solaris 10 , you must choose to install different software clusters. A software cluster is a set of packages to be installed. According to what your system needs to achieve, you can put more or less software during the install. During the installation process, the list of software clusters are presented to the user, who must choose one before going further.

Ok but how do you know what software clusters are available in Solaris after the installation ?

Either by going through the extremely useful online documentations or more easily by looking at the “METACLUSTER” lines the following file

# grep METACLUSTER /var/sadm/system/admin/.clustertoc
METACLUSTER=SUNWCXall
METACLUSTER=SUNWCall
METACLUSTER=SUNWCprog
METACLUSTER=SUNWCuser
METACLUSTER=SUNWCreq
METACLUSTER=SUNWCrnet
METACLUSTER=SUNWCmreq

You can see the seven available clusters, knowing that from bottom to top, you get more and more software. Usually for not going wrong, I install my systems using SUNWCall (SUNWCXall will also install some third party software)

Next step is to know with what software cluster a given system has been installed. This insformation is stored in the following file:

# cat /var/sadm/system/admin/CLUSTER
CLUSTER=SUNWCall

See I wasn't lying: I do install my systems with SUNWCall !

To get back to our missing package, SUNWpool is installed from the software SUNWCuser and above. This means that any cluster bellow will NOT intall that given package.

And of course...the customer after checking his system, found that it had been installed using SUNWCreq. Therefor it's pretty normal not to find the package. Upgrading the system to the software cluster just above solved the problem.

Wednesday Feb 18, 2009

Weather you lost your way to this place, or you came here by intention, welcome to you stranger !

My name is Amir, I am 36 and work at Sun Microsystems in a team called ISV engineering (ISV being the Independent Software Vendors). To make it simple: our team brings expertise of Sun's technologies to our partners. We help them port their applications on Solaris for example, or do optimizations, put up sizing studies, benchmarks, and so on.

I work a lot in the Web world, specially around the LAMP or SAMP stack (Solaris+Apache+MySQL+PHP) and if you are interested by this subject, I blog more specifically about these subjects here: blogs.sun.com/web

Yes, I do blog already elsewhere, so why would I want to blog here ? Could it be because I'm a crazy blogger ? Hmm, not really.

So is my ego so big that I need to talk about myself over and over again ? I don't think so either.

I'm rather pretty new to all this blogging stuff, and never really felt the need for it. Maybe because I'm a little to old to be part of the so called Generation Y, born between 1974 and 1980, and raised with a computer keyboard in their hands.

The truth is that I need to blog for my job, which is somehow a cultural revolution for me. My posts on my “official” blog will be more serious, and less spontaneous, because when posting a technical text I need to feel it has some meat, thinking twice (at least twice) before posting and when done always having the feeling that I missed something, or said something stupid or inaccurate. When I post, I'm always expecting 1 trillion comments pointing out the errors I have made (who just said I'm paranoid ?)

This bring me to the (real) question I'm asking myself: To what extend should a post on a blog be spontaneous ? What's the right length of a post in order to be interesting, informative but not boring ?

To what extend should what we say be checked and double checked ? In other words: To what extent do we have the right to be wrong ?

Any input from you, dear reader, would be appreciated.

I therefor declare this place my freedom blog where I will have the right to sound stupid, be wrong, write uninteresting posts, just because I felt like doing so.

Think about this place as the “Mr. Hyde” of my official “Dr. Jekyll” blog. I will try to post here things that either are not related to my job, or things that don't really fit the web technologies. Think of this as a potpourri !

So once again, welcome stranger.

This blog copyright 2009 by Amir Javanshir