Henk Vandenbergh

Wednesday Aug 05, 2009

Error: "trying to put 8192 bytes into a 512-byte buffer"

Two days after going GA, a major bug was found in Data Validation and Journaling. Vdbench lost track of the data buffer size it needed. After the journal recovery Vdbench reads each block that has ever been written to make sure the data is valid. Because of an incorrect buffer size you get the message “trying to put 8192 bytes into a 512 byte buffer”, where 8192 is the data transfer size used in the run that created the journal file.

For expedience purposes I just replaced the vdbench501 distribution files. If you have this problem with 5.01 (and likely also with 5.00) downloaded before August 5 2009, 1:41 pm MDT, please download a fresh copy.

Monday Aug 03, 2009

First problem in Vdbench 5.01

It did not take too long.

java.lang.NullPointerException
 at Vdb.InfoFromHost.getInfoForMaster(InfoFromHost.java:304)

When specifying sd=sd1,lun=cxtxdxsx Vdbench will abort because it can not find a parent directory entry for this 'lun'. Make sure you specify lun=/dev/rdsk/cxtxdxsx

Henk.

Vdbench 5.01 now GA

Today I placed Vdbench 5.01 on vdbench.org.

You can find information about the changes contained in this release here.

If you have questions and/or problems, you may contact me at vdbench@sun.com

Henk

Friday Jul 31, 2009

Vdbench, VmWare, and bash

Today I had a user who tried to run Vdbench on some custom made Linux version that does not contain csh, only bash. This happened before when running Vdbench on VmWare, and below you'll find a replacement for the vdbench script needed to fix this.

First copy /bin/bash to /bin/csh, cloning bash., then replace the ./vdbench script with the script below, or for the soon to be out vdbench 5.01, replace it with ./vdbench.bash. The reason for the cloning of bash is that Vdbench internally also calls csh.

 Henk.

#
# This script was written specifically for running Vdbench on native VmWare .
# It turns out that VmWare does NOT have the Cshell.
# This also works for some brand-x version of Linux that did not have csh.
#
# Intructions:
# - cp /bin/bash /bin/csh   ===> This creates a clone of bash, naming it csh.
# - cp vdbench.bash vdbench ===> vdbench will now use THIS script instead of ./vdbench
#
#


# Directory where script was started from:
dir=`dirname $0`

# If the first parameter equals -SlaveJvm then this means that
# the script must start vdbench with more memory.
# Since all the real work is done in a slave, vdbench itself can be
# started with just a little bit of memory, while the slaves must
# have enough memory to handle large amount of threads and buffers.

# Set classpath.
# $dir                 - parent of $dir/solaris/solx86/linux/aix/hp/mac subdirectory
# $dir/../classes      - for development overrides
# $dir/vdbench.jar     - everything, including vdbench.class
cp=$dir/:$dir/classes:$dir/vdbench.jar

# Proper path for java:
java=java


# When out of memory, modify the first set of memory parameters. See above.
# '-client' is an option for Sun's Java. Remove if not needed.
if [ "$1" = "SlaveJvm" ]; then
  $java -client -Xmx1024m -Xms128m -cp $cp Vdb.SlaveJvm $*
  exit $status
else
  $java -client -Xmx512m  -Xms64m  -cp $cp Vdb.Vdbmain $*
  exit $status
fi

Tuesday Jul 14, 2009

Vdbench: Sun StorageTek Vdbench, a storage I/O workload generator.

This is a copy of the blog entry I just created on Sun's BestPerf blog: http://blogs.sun.com/BestPerf


Vdbench is written in Java (and a little C) and runs on Solaris Sparc and X86, Windows, AIX, Linux, zLinux, HP/UX, and OS/X.

I wrote the SPC1 and SPC2 workload generator using the Vdbench base code for the Storage Performance Council: http://www.storageperformance.org

Vdbench is a disk and tape I/O workload generator, allowing detailed control over numerous workload parameters like:

Options:

· For raw disk (and tape) and large disk files:

o Read vs. write

o Random vs. sequential or skip-sequential

o I/O rate

o Data transfer size

o Cache hit rates

o I/O queue depth control

o Unlimited amount of concurrent devices and workloads

o Compression (tape)

· For file systems:

o Number of directory and files

o File sizes

o Read vs. write

o Data transfer size

o Directory create/delete, file create/delete,

o Unlimited amount of concurrent file systems and workloads

Single host or Multi-host:

All work is centrally controlled, running either on a single host or on multiple hosts concurrently.

Reporting:

Centralized reporting, reporting and reporting using the simple idea that you can't understand performance of a workload unless you can see the detail. If you just look at run totals you'll miss the fact that for some reason the storage configuration was idle for several seconds or even minutes!

  • Second by second detail of by Vdbench accumulated performance statistics for total workload and for each individual logical device used by Vdbench.
  • For Solaris Sparc and X86: second by second detail of Kstat statistics for total workload and for each physical lun or NFS mounted device used.
  • All Vdbench reports are HTML files. Just point your browser to the summary.html file in your Vdbench output directory and all the reports link together.
  • Swat (an other of my tools) allows you to display performance charts of the data created by Vdbench: Just start SPM, then 'File' 'Import Vdbench data'.
  • Vdbench will (optionally) automatically call Swat to create JPG files of your performance charts.
  • Vdbench has a GUI that will allow you to compare the results of two different Vdbench workload executions. It shows the differences between the two runs in different grades of green, yellow and red. Green is good, red is bad.

Data Validation:

Data Validation is a highly sophisticated methodology to assure data integrity by always writing unique data contents to each block and then doing a compare after the next read or before the next write. The history tables containing information about what is written where is maintained in memory and optionally in journal files. Journaling allows data to be written to disk in one execution of Vdbench with Data Validation and then continued in a future Vdbench execution to make sure that after a system shutdown all data is still there. Great for testing mirrors: write some data using journaling, break the mirror, and have Vdbench validate the contents of the mirror.

I/O Replay

A disk I/O workload traced using Swat (an other of my tools) can be replayed using Vdbench on any test system to any type of storage. This allows you to trace a production I/O workload, bring the trace data to your lab, and then replay your I/O workload on whatever storage you want. Want to see how the storage performs when the I/O rate doubles? Vdbench Replay will show you. With this you can test your production workload without the hassle of having to get your data base software and licenses, your application software, or even your production data on your test system.

For more detailed information about Vdbench go to http://vdbench.org where you can download the documentation or the latest GA version of Vdbench.

You can find continuing updates about Swat and Vdbench on my blog: http://blogs.sun.com/henk/

Henk Vandenbergh

PS: If you're wondering where the name Vdbench came from :  Henk Vandenbergh benchmarking.

Storage performance and workload analysis using Swat.

This is a copy of the blog entry I just created on Sun's BestPerf blog:  http://blogs.sun.com/BestPerf

Swat (Sun StorageTek Workload Analysis Tool) is a host-based, storage-centric Java application that thoroughly captures, summarizes, and analyzes storage workloads for both Solaris and Windows environments.

This tool was written to help Sun’s engineering, sales and service organizations and Sun’s customers understand storage I/O workloads.


 Sample screenshot:



Swat can be used for among many other reasons:

  • Problem analysis
  • Configuration sizing (just buying x GB of storage just won't do anymore)
  • Trend analysis: is my workload growing, and can I identify/resolve problems before they happen?

Swat is storage agnostic, so it does not matter what type or brand of storage you are trying to report on. Swat reports the host's view of the storage performance and workload, using the same Kstat (Solaris) data that iostat uses.

Swat consists of several different major functions:

· Swat Performance Monitor (SPM)

· Swat Trace Facility (STF)

· Swat Trace Monitor (STM)

· Swat Real Time Monitor

· Swat Local Real Time Monitor

· Swat Reporter

Swat Performance Monitor (SPM):

Works on Solaris and Windows. An attempt has been made in the current Swat 3.02 to also collect data on AIX and Linux. Swat 3.02 also reports Network Adapter statistics on Solaris, Windows, and Linux. A Swat Data Collector (agent) runs on some or all of your servers/hosts, collecting I/O performance statistics every 5, 10, or 15 minutes and writes the data to a disk file, one new file every day, automatically switched at midnight.

The data then can be analyzed using the Swat Reporter.

Swat Trace Facility (STF):

For Solaris and Windows. STF collects detailed I/O trace information. This data then goes through a data Extraction and Analysis phase that generates hundreds or thousands of second-by-second statistics counters. That data then can be analyzed using the Swat Reporter. You create this trace for between 30 and 60 minutes for instance at a time when you know you will have a performance problem.

A disk I/O workload traced using Swat can be replayed on any test system to any type of storage using Vdbench (an other of my tools, available at http://vdbench.org). This allows you to trace a production I/O workload, bring the trace data to your lab, and then replay that I/O workload on whatever storage you want. Want to see how the storage performs when the I/O rate doubles or triples? Vdbench Replay will show you. With this you can test your production workload without the hassle of having to get your data base software and licenses, your application software and licenses, or even your production data.

Note: STF is currently limited to the collection of about 20,000 IOPS. Some development effort is required to handle the current increase in IOPS made possible by Solid State Devices (SSDs).

Note: STF, while collecting the trace data is the only Swat function that requires root access. This functionality is all handled by one single KSH script which can be run independently. (Script uses TNF and ADB).

Swat Trace Monitor (STM):

With STF you need to know when the performance problem will occur so that you can schedule the trace data to be collected. Not every performance problem however is predictable. STM will run an in-memory trace and then monitors the overall storage performance. Once a certain threshold is reach, for instance response time greater than 100 milliseconds, the in-memory trace buffer is dumped to disk and the trace then continues collecting trace data for an amount of seconds before terminating.

Swat Real Time Monitor:

When a Data Collector is active on your current or any network-connected host, Swat Real Time Monitor will open a Java socket connection with that host, allowing you to actively monitor the current storage performance either from your local or any of your remote hosts.

Swat Local Real Time Monitor:

Local Real Time Monitor is the quickest way to start using Swat. Just enter './swat -l' and Swat will start a private Data Collector for your local system and then will show you exactly what is happening to your current storage workload. No more fiddling trying to get some useful data out of a pile of iostat output.

Swat Reporter:

The Swat Reporter ties everything together. All data collected by the above Swat functions can be displayed using this powerful GUI reporting and charting function. You can generate hundreds of different performance charts or tabulated reports giving you intimate understanding of your storage workload and performance. Swat will even create JPG files for you that then can be included in documents and/or presentations. There is even a batch utility (Swat Batch Reporter) that will automate the JPG generation for you. If you want, Swat will even create a script for this batch utility for you.

Some of the many available charts:

  • Response time per controller or device
  • I/O rate per controller or device
  • Read percentage
  • Data transfer size
  • Queue depth
  • Random vs. sequential (STF only)
  • CPU usage
  • Device skew
  • Etc. etc.

Swat has been written in Java. This means, that once your data has been collected on its originating system, the data can be displayed and analyzed using the Swat Reporter on ANY Java enabled system, including any type of laptop.

For more detailed information go to (long URL)where you can download the latest release, Swat 3.02.

You can find continuing updates about Swat and Vdbench on my blog: http://blogs.sun.com/henk/

Henk Vandenbergh

Thursday Jul 09, 2009

Swat Analyze function and Java heap problems, continued

If you read my earlier blog about java heap problems with Analyze, here is an other change that you can make, but it has some risks.

Swat Analyze by default keeps the last 180 seconds worth of i/o detail in memory. This is done so that Swat can identify any i/o that takes up to 180 seconds (180 sounds high, but I have seen them). If you know for sure that you do not have any i/o taking longer than for instance 30 seconds, you can change this 180 second value.

When using the GUI, select the 'Settings' tab, click on the 'batch_prm' line, add '-a30' (don't add the quotes), click 'Save', and restart Analyze.

If you can't use the GUI, go to file 'options.sUSERID.ini' and manually change the value after 'batch_prm' to '-a30' and restart Analyze. 

Note though that then any i/o lasting longer than 30 seconds will NOT be recognized by Swat.

Swat Analyze function and Java heap problems (OutOfMemoryError)

The default Java heap size given to Swat Analyze is -Xmx1024m (this is hardcoded, and is not to be confused with the heap size specified in the swat and swat.bat scripts).

For long traces and/or when you have lots of luns, 1024m may not always be enough. Could I increase the default? Technically, yes. However, for users that do not have much memory or swap space that could mean that they then can not even run the Swat Analyze at all. 1024m therefore is a decent starting value.

When your Analyze fails with 'java.lang.OutOfMemoryError: Java heap space', ignore the suggestion in  message 'Rerun as 'java -Xmx512m...', since that though technically is correct, is not complete enough. Changing the swat/swat.bat scripts also will not solve this.

When using the GUI, select the 'Settings' tab, click on the 'java_prm' line, increase the -Xmx value (the highest I have been able to do is -Xmx2560m), click 'Save', and restart Analyze.

If you can't use the GUI, go to file 'options.sUSERID.ini' and manually change the value after 'java_prm' and restart Analyze. 


Tuesday Jul 07, 2009

Data Validation history table and Java heap space.

Data Validation stores information about what data pattern is written where in a table that requires one byte per data block. With very large luns that can become a problem. I just saw a run where xfersize=4096 was used against a total amount of storage of 5.7 terabytes. That requires about 1.5GB worth of table space. The default Java heap given to Vdbench is 1024m so that's not good.

Vdbench 5.00 also does not abort nicely when it runs out of memory for Data Validation, and it gets into a "Waiting for slave synchronization:" hang. Vdbench 5.01 will fix this.

To avoid this problem:

  • Change the swat or swat.bat script, replacing -Xmx1024m with a higher value.
  • Use a larger xfersize=
  • use less devices
  • use either the size= or range= parameter
Henk.

Monday Jul 06, 2009

External mailing addresses for Swat and Vdbench

I just received an email from sourceforge where vdbench resides, but my reply email bounced, 'unknown user'. Don't know why, but here is a better way to communicate with me:

Vdbench: vdbench@sun.com, and Swat: swat@sun.com. I'll do my best to respond ASAP. Of course, Sun employees know where to find me at my internal email address, and they will get priority.

Henk.

Tuesday Jun 23, 2009

Running Vdbench 5.00 on Itanium HP/UX

Jon-Seo Kim this week was unable to run on his HP Itanium system; the shared library that is included with 5.00 failed to load. He did a compile for me, and his problem is resolved. (Thank you Jon-Seo). I assume now that the original compile was done on a PA-RISC system and therefore failed to load on Itanium. If anyone has successfully run on a PA_RISC system can you confirm this for me on vdbench@sun.com?

If you need a copy of this Itanium shared library, please contact me at vdbench@sun.com

Henk

Friday Jun 19, 2009

Continuous running of Vdbench

Q: "I am new to vdbench and I have just begun using vdbench 407. Is there an option from the command line or value I can set in the parameter file to have vdbench run until the user decided to stop vdbench"

A: I would prefer you use vdbench 500 from vdbench.org, That gives you a new option to do what I think you're asking.

- If you just have one Run Definition (RD), you can just specify 'elapsed=100h" for 100 hours or any high value that's less than 31 bits worth of seconds. That's already in any version of Vdbench.

- You can override the elapsed time also from the command line: -ennn (value in seconds)


- Vdbench500: if you have one or more RD's, then add the ''-l' execution parameter (loop), e.g. ./vdbench -f parmfile -l
This causes Vdbench to run all RDs, and then when done starts at the first RD again, etc.

Henk.

Monday Jun 15, 2009

Unable to execute command: /usr/bin/ls -rlL /dev/rdsk

The /dev/rdsk/ directory frequently shows entries of devices that either don't exist, or existed a long long time ago. This has caused problems before, see http://blogs.sun.com/henk/entry/vdbench_and_waiting_for_configuration.

This month however I all of a sudden saw two instances where the garbage caused the 'ls' command to generate error messages, and with that, returned a non-zero return code to Vdbench.  That causes Vdbench to abort. Just go ahead and run 'devfsadm -C' to clean the /dev/rdsk/ directory.


Wednesday Apr 22, 2009

“java.lang.NullPointerException” using Swat 3.02 Batch Reporter

Swat (or Vdbench)   “java.lang.NullPointerException” when using Swat 3.02 batch reporter, either directly or in a Vdbench run to create JPG files of performance charts. Please install Swat 3.01 from here.

For Sun internal use: you can find a corrected Swat 3.02 tar/zip file on my Sun internal website.

Henk.

Wednesday Apr 08, 2009

Vdbench: Slave aborting: Shutdown took more than three minutes

At the end of 'elapsed=' seconds, Vdbench tells its i/o threads to no longer start new i/o.
Vdbench however gives its i/o threads a maximum of three minutes to allow the  i/o's that are still outstanding to complete. If after these three minutes the i/o still is not completed Vdbench will abort with "Slave aborting: Shutdown took more than three minutes".

Usual cause of this problem is a device and/or storage controller that is no longer responding to i/o.


Henk.

Calendar

Feeds

Search

Links

Navigation

Referrers