One of the necessary checkpoint before launching a product is to be
able to assess it's performance. With Sun Storage 7xxx we had a
challenge in that the only NFS benchmark of notoriety was SPEC
SFS. Now this benchmark will have it's supporters and some customers
might be attached to it but it's important to understand what a
benchmarks actually says.
These SFS benchmark is a lot about "cache busting" the server : this is
interesting but at Sun we think that Caches are actually helpful in
real scenarios. Data goes in cycles in which it becomes hot at times.
Retaining that data in cache layers allow much lower latency access,
and much better human interaction with storage engines. Being a cache
busting benchmark, SFS numbers end up as a measure of the number of
disk rotation attached to the NAS server. So good SFS result requires
100 or 1000 of expensive, energy hungry 15K RPM spindles. To get good
IOPS, layers of caching are more important to the end user experience
and cost efficiency of the solution.
So we needed another way to talk about performance. Benchmarks tend to
test the system in peculiar ways that not necessarely reflect the
workloads each customer is actually facing. There are very many
workload generators for I/O but one interesting one that is OpenSource
and extensible is
Filebench
available in
Source.
So we used filebench to gather basic performance information about our
system with the hope that customers will then use filebench to
generate profiles that map to their own workloads. That way, different
storage option can be tested on hopefully more meaningful tests than
benchmarks.
Another challenge is that a NAS server interacts with client system
that themselve keep a cache of the data. Given that we wanted to
understand the back-end storage, we had to setup the tests to avoid
client side caching as much as possible. So for instance between the
phase of file creation and the phase of actually running the tests we
needed to clear the client caches and at times the server caches as
well. These possibilities are not readily accessible with the
simplest load generators and we had to do this in rather ad-hoc
fashion. One validation of our runs was to insure that the amount of
data transfered over the wire, observed with
Analytics was compatible
with the aggregate throughput measured at the client.
Still another challenge was that we needed to test a storage system
designed to interact with large number of clients. Again load
generators are not readily setup to coordinate multiple client and
gather global metrics. During the course of the effort filebench did
come up with a clustered mode of operation but we actually where too
far engaged in our path to take advantage of it.
This coordination of client is important because, the performance
information we want to report is actually the one that is delivered to
the client. Now each client will report it's own value for a given
test and our tool will sum up the numbers; but such a Sum is only
valid inasmuch as the tests ran on the clients in the same timeframe.
The possibility of skew between tests is something that needs to be
monitored by the person running the investigation.
One way that we increased this coordination was that we
divided our tests in 2 categories; those that required precreated
files, and those that created files during the timed portion of the
runs. If not handled properly, file creation would actually cause
important result skew. The option we pursued here was to have a
pre-creation phase of files that was done once. From that point, our
full set of metrics could then be run and repeated many times with
much less human monitoring leading to better reproducibility of
results.
Another goal of this effort was that we wanted to be able to run our
standard set of metrics in a relatively short time. Say less than 1
hours. In the end we got that to about 30 minutes per run to gather 10
metrics. Having a short amount of time here is important because there
are lots of possible ways that such test can be misrun. Having someone
watch over the runs is critical to the value of the output and to it's
reproducibility. So after having run the pre-creation of file
offline, one could run many repeated instance of the tests validating
the runs with
Analytics and through general observation of the system
gaining some insight into the meaning of the output.
At this point we were ready to define our metrics.
Obviously we needed streaming reads and writes. We needed ramdom reads.
We needed small synchronous writes important to Database workloads and
to the NFS protocol. Finally small filecreation and stat operation
completed the mix. For random reading we also needed to distinguish
between operating from disks and from storage side caches, an
important aspect of our architecture.
Now another thing that was on my mind was that, this is not a
benchmark. That means we would not be trying to finetune the metrics
in order to find out just exactly what is the optimal number of
threads and request size that leads to best possible performance from
the server. This is not the way your workload is setup. Your number of
client threads running is not elastic at will. Your workload is what
it is (threading included); the question is how fast is it being
serviced.
So we defined precise
per client workloads with preset number
of thread running the operations. We came up with this set just as an
illustration of what could be representative loads :
1- 1 thread streaming reads from 20G uncached set, 30 sec.
2- 1 thread streaming reads from same set, 30 sec.
3- 20 threads streaming reads from 20G uncached set, 30 sec.
4- 10 threads streaming reads from same set, 30 sec.
5- 20 threads 8K random read from 20G uncached set, 30 sec.
6- 128 threads 8K random read from same set, 30 sec.
7- 1 thread streaming write, 120 sec
8- 20 threads streaming write, 120 sec
9- 128 threads 8K synchronous writes to 20G set, 120 sec
10- 20 threads metadata (fstat) IOPS from pool of 400k files, 120 sec
11- 8 threads 8K file create IOPS, 120 sec.
For each of the 11 metrics, we could propose mapping these to relevant industries :
1- Backups, Database restoration (source), DataMining , HPC
2- Financial/Risk Analysis, Video editing, HPC
3- Media Streaming, HPC
4- Video Editing
5- DB consolidation, Mailserver, generic fileserving, Software development.
6- DB consolidation, Mailserver, generic fileserving, Software development.
7- User data Restore (destination)
8- Financial/Risk Analysis, backup server
9- Database/OLTP
10- Wed 2.0, Mailserver/Mailstore, Software Development
11- Web 2.0, Mailserver/Mailstore, Software Development
We managed to get all these tests running except the fstat (test 10)
due to a technicality in filebench. Filebench insisted on creating
the files up front and this test required thousands of them; moreover
filebench used a method that ended up single threaded to do so and in
the end, the stat information was mostly cached on the client. While
we could have plowed through some of the issues the conjunctions of
all these made us put the fstat test on the side for now.
Concerning thread counts, we figured that single stream read test was
at times critical (for administrative purposes) and an interesting
measure of the latency. Test 1 and 2 were defined this way with test
1 starting with cold client and server caches and test 2 continuing
the runs after having cleared the client cache (but not the server)
thus showing the boost from server side caching. Test 3 and 4 are
similarly defined with more threads involved for instance to mimic a
media server. Test 5 and 6 did random read tests, again with test 5
starting with a cold server cache and test 6 continuing with some of
the data precached from test 5. Here, we did have to deal with client
caches trying to insure that we don't hit in the client cache too much
as the run progressed. Test 7 and 8 showcased streaming writes for
single and 20 streams (per client). Reproducibility of test 7 and 8
is more difficult we believe because of client side
fsflush issue. We
found that we could get more stable results tuning fsflush on the
clients. Test 9 is the all important synchronous write case (for
instance a database). This test truly showcases the benefit of our
write side SSD and also shows why tuning the recordsize to match ZFS
records with DB accesses is important. Test 10 was inoperant as
mentioned above and test 11 filecreate, completes the set.
Given that those we predefined test definition, we're very happy to
see that our numbers actually came out really well with these tests
particularly for the Mirrored configs with write optimized SSDs.
See for instance results obtained by
Amitabha Banerjee .
I should add that these can now be used to give ballpark estimate of the
capability of the servers. They were not designed to deliver the
topmost numbers from any one config. The variability of the runs are
at times more important that we'd wish and so your mileage will
vary. Using
Analytics to observe the running system can be quite
informative and a nice way to actually demo that capability. So use
the output with caution and use your own judgment when it comes to
performance issues.
Data entry is a fast growing industry. The world of business is dynamic, fast paced, and in constant flux. So Real data assistance.com provides high quality and accurate data entry services. <a href="http://www.realdataassistance.com/">Data entry service providers</a>
Posted by Robert on décembre 04, 2008 at 06:08 AM MET #