Data Processing
Valdis's Weblog
Archives
« May 2008 »
MonTueWedThuFriSatSun
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
23
24
25
26
27
28
29
30
31
 
       
Today
Click me to subscribe
Search

Links
 

Today's Page Hits: 85

Locations of visitors to this page
« Previous month (Mar 2008) | Main | Next month (May 2008) »
Thursday May 22, 2008
Secret of blog writing

Well what I do is write my blogs when I get an inspiration, this most often is not while I am in the office. Most of it is while I am travelling in trains, planes, automobiles. So 3 blogs today, however, they were stored for weeks and months when I wrote the draft in some airplane or remote location.

3 blogs in a day, I am catching up to my lost quota of 1 blog per week. I still have 10 saved up over the months.

Posted at 07:33PM May 22, 2008 by Valdis Filks in Business  |  Comments[0]

What is a mainframe (a big Unix server)

The marketplace and hollywood have very restrictive views and impressions of the mainframe and many believe that IBM are the only ones that can supply such a thing. Well in reality the mainframe does not exist only in the minds of marketing and fiction. As many suppliers nowadays have equivalent and/or superior mainframe systems to IBM's.

Everyone who has not worked with mainframes has this impression; they are large, expensive, fast, complex, special, adaptable, reliable.

Everyone who has worked with mainframes knows the following; they are large, expensive but will be discounted on the hardware to make them look competitive, but software charges and maintenance will be expensive, they are not fast anymore or no faster than any other large system, you can only run zOS (MVS), they are not complex just totally proprietary. Not really adaptable, everything needs to run on the zOS instruction set need hypervisors (read overhead) to do conversions to run other interesting OSes like Linux, really good for the old stuff like CICS and old batch programs that no-one understands as the people who wrote the retired. They are special as a whole load of emotional baggage goes with them, companies cannot migrate from them as no-one understand the apps, so companies are held hostage, costs are often hidden in various ways (s/w, maint, service) so you can never get to the bottom of the TCO. As for reliability no different than any other major large system where you have good change control. Mainframes have good reliability records as no-one is allowed to change them. If you have other high end systems that you do not change they will be as reliable as a "mainframe". The mainframe hardware is not special compared to other high end systems. Historically they were better about 10yrs ago, but this myth continues.

So we either stop calling the z10 server a mainframe or we call every high-end server a mainframe. Sun and IBM high end Unix mainframes (servers), (not sure about HP, SGI etc) high end servers have better or equivalent features to the z10 servers.

Mainframe acid test, ask IBM what is the best plaform for every application that you want to run, what is the difference between their Linux actually Red Hat, AIX and zOS offerings. Is AIX inferior to zOS (aka MVS), is Red Hat not as good as AIX. Is Red had better than zOS. When should we use what.

Can I run Red Hat on a p-series
Can I run AIX on a z10 mainframe server (apparently it uses the same CPUs)
Can I run zOS on a p-series.

I get really confused with the positioning and confusion of AIX, Red Hat and z/OS. When do I do what with what. Or do I just do everything with everything and make a real mess. Good fun for the techies.

So with HP recently they came up with some logical partitioning with something called Dynamic LPARs, well that is old established technology, nothing new here. Available from others and established technology for a long time. However, HP engineering was always good, HP used to be a engineering company initially. So their high end UNIX servers can also be looked upon as mainframes.

Now Sun M-series servers have instruction retry, whole memory DIMM mirroring, crossbar ECC, more dynamic replacement of CPU's, RAM than any other supplier while the system is running and more. So you not get get more reliable than that and mainframes may not even have the reliability of some high end Unix servers.

Mainframe has been around since 1960's so the name is familiar but missused. Like many things in life, understanding, education and knowledge helps dispell these myths when once upon a time long ago they were something special. Now I like the mainframe, love to tune the assembler code, patch and zap modules, good for your old apps that no-one understands and your company is too scared to migrate to a new lower cost system. But as with everything technology moves forward and other high end systems especially large Unix servers are just as good or better and can be understood by more people.

Posted at 07:31PM May 22, 2008 by Valdis Filks in Business  |  Comments[2]

Tuning for filesystem performance, specifically QFS

The holy grail of storage performance, here goes (this question comes up every week).

To make I/O performance perfect, a block of data needs to be transferred, unhindered and unaltered with as few dissasemblies and assemblies as possible as it travels from the CPU to the physical disks. I have explained this many times and tuned this for over 20yrs and the basic rules do not change, strange thing. Neither do Moore's Law or Amdahls law, but they do get misquoted.

So if you application writes in 16K blocks make sure that all components in the I/O path for this application work in 16K units or larger. But not too much large as you will be wasting resources.

-- Exceprt from a discussion a couple of weeks ago, when an app was writing data in 128KB blocks and we were using a shared HPC SAN fielsystem called QFS, may be useful to someone ---

Suppliers (array manufacturers), industry etc mix up segment size and stripe width. This is what I do:

Understand you disk arrays and how they transfer data.

segment size is size of block write on a individual disk (your case 128KB)
stripe width is the amount that the array controller writes to a raid vol/grp/unit, this is number of disks x segment size (your case 128KB x 4 = 512KB). Person was using 4 disks.

Now the DAU (Disk Allocation Unit) that QFS uses to write a block of data for most best practices should match this to avoid write/read miss and what we want to do is for one QFS read or write you only have one "RAID group" read/write. But you can specify the DAU to be what ever you want, within reason.

Your application is writing data in blocks of 0.5MB, So yes your DAU should be 512KB.

So you can have 4 disks of 128KB seg size, or 8 disks or 64KB seg size etc. 8 disks will give more performance than 4, and if you have a 8D+1P RAID 5 group this just happens to fit nicely. NB 1 disk is for RAID parity so you need to add this to the 8 disks for data.

Remember no matter how good a disk arrays cache system is, with the sizes of databases etc that we have nowadays the cache can get overwhelmed very quickly if you do not have enough spindles or disks as we call them. In the end performance is determined by the number of IOPS (I/O's per second) of the backend disks. Try a database load, import/export of a table and watch you disk array performance deteriorate as the cache just cannot keep up.

Now IOPS, very approximate rule is that the faster the disk spins the higher will be it's performance. However, if you can get the average seek time and rotational latency from a disk manufacturers disk sheet then you can work out IOPS. IOPS can be calculated by using the following formula;

IOPS = 1000ms/(averag seek time + rotational latency)

Now QFS stripe options can also help here, but that is an even bigger story. QFS can do round robin writes and stripe accross many disk array RAID groups/sets.

The trick is that the DAU is (most of the time I am sure there will be exceptions) the same blocksize (currency) that the app uses. e.g app writes 950KB DAU should be 1024KB. Most apps behave in the normal powers of 2 KB type (8,16,32,64,128,256,512,1024,2048) thing so you should have a close match as DAU's can be the same size.

What we try to do most of the time is to configure the system so that all the "gates" from the app to the disk raid group are the same size. The "truck" i.e. the block of data fits all the way from the app to QFS filesystem to RAID group without having to do 2 or more writes/read for one requested block for the application. Nightmare scenario is that for an app writing one block the array does many writes. e.g. app writes 128KB and stripe width = 32KB, thus everytime the app does a single read or write the controller has to ask (read/write) 4 times. This is serious I/O performance overhead and what I can make lots of money fixing.

Make sure that your block is not disassembled or assembled in it's journey from the app to the disk. OK the PCI bridges and HBA's may do this but we cannot change that. PCI lanes is getting into deep heavy tech stuff.

So I normally work this way. Find the app blocksize, then make the DAU the same, then make the stripe width the same as the DAU, then decide how many disks we want to use to get IOPS and then divide the stripe width derived from the above calculation by the number of spindles in the physical arrays raid group, to get the segment size. Now the segment sizes are mainly fixed on the arrays that we use, from 8,16,32,64,126,256KB. So we sometimes do not get a round number, to match the app blksize, DAU etc. However, I always make this "magic gate number for the blocks/trucks" larger than the DAU so to avoid 2 physical reads/writes per each application write/read, which is the crux of all application and I/O tuning.

Storage heaven is where we have full stripe writes and reads. Which is implied by the application block size, DAU fitting the stripe width accordingly.

You can check this with various tools by using vdbench (storage perfromance saviour) to do 10 writes or reads of a specific blocksize and if the array does not do the same amount of I/O's (e.g. 10 writes/reads) then you are not hitting the G spot (array Group Size) spot. So if you do 10 writes and the array did 20 your seg size quite likely is half of your DAU or app blocksize. Remember filesystems do strange things to application writes and can mutilate them in more ways than we can dream up, so we have to know and understand filesystems. A good old Unix test is the "dd" command, if you have a array with a certain number of disks in a RAID group run dd to the actual raw Lun to see what it can do. Your filesystem layout which you use later if correctly tuned should get close to this number. If you get more then you are a candidate for a Nobel prize. If widely different then something between the app and the disk is messing thinks up. No chance of a Nobel prize, maybe a Darwin prize.

Think of a truck going down a road and all the tunnels and lanes are the exact size or bigger, thus the trucks journey is never hindered and the driver does not have to unload/dissassemble, load/assemble the truck (block of data) to get it through the tunnels, lanes, toll gates.

Now can the QFS community guys check this as I have been know to write faster than I think. But have have got close to max specified speeds on 6140 and 6540 using this technique. Plus some old heritage and legacy arrays.

Now Have I put the whole storage consultancy business out of a job. Not really, take this example. A woman calls a mechanic (call him Jerry) to fix her car as the engine does not work. Then Jerry takes a look at the engine and gets a hammer, he hits a specific part of the engine and the engine starts to work. Jerry says, that will cost you $500. The woman says, you must be joking, you just hit it with a $10 hammer. Well says Jerry, the bill is for $10 for the hammer and $490 for the knowledge where to hit the engine. You pay for knowledge not the muscle.

Posted at 07:05PM May 22, 2008 by Valdis Filks in Technical  |  Comments[2]

Exploiting Dtrace to tune and qualify applications to Solaris 10.

We had a customer using an application on a customer platform, the supplier wanted funding from the customer to re-write the application to another Unix. Sun recommended to use Dtrace to find the areas that could be improved and help the ISV move from Solaris 8 to Solaris 10. However, the application supplier said that it would take too much work/money etc, it would cost less to re-write to another unix.

However, the OEM wanted the binary compatability value offered by Sun and to exploit Sun's multi-core chip technologies. So when the customer officially requested that the ISV work with Sun to determine if the application could be moved to Solaris 10, we discovered some interesting hidden agendas. Sun together with the supplier very quickly found areas in the application code for improvement at a fraction of the time and cost of a port to another platform. Application performance was improved by an order of magnitude, 10 x faster and qualification onto Solaris 10 was not a problem. Costs of taking the Solaris 8 binary compatible code, tuning it with Dtrace and moving to Solaris 10 were a fraction of re-writing to another platform. So beware, people may want funding to move applications to platforms, then ask for more money later to support a myriad of kernels. However, with Dtrace they can improve performance dramatically and shorten the time to qualify Solaris 8 & 9 apps onto Solaris 10. Buyer beware, the reason for not moving to the latest release of Solaris may not be technical.

By the way we also consolidated several servers runing this application onto one M5000 using Solaris containers to run many virtualised instances of the application. Ended up with nearly 100's of applications running on one server. The customer could now reduce the number or racks that they needed to.

Carzy world, moving applications from Solaris 8 to Solaris 10 is easy and you can consolidate your servers at the same time. Madness where will it all end. One big computer.

Make sure that all ISV's and application providers know the benefits of a faster time to market and reduced development costs that Solaris and Dtrace can bring them. Costs too much to qualify on Solaris 10, what a red herring. Make your application faster with minimum investment, use those Dtrace features. It is the developers best friend, I wish I had such a tool when I was a programmer.

The moral of the story is; All end users, ask your application suppliers or Sun to validate application qualification to the latest release of Solaris with Dtrace.

Why does this remind me of some builders when you are renovating your house. When they say "that will cost you", in reality it is a much simpler job. Maybe that is why the DIY business is so profitable.

Posted at 06:18PM May 22, 2008 by Valdis Filks in Business  |  Comments[1]