With the release of
2009.Q3 release of fishworks along with a new
iSCSI
implemtation we're coming up with
a very significant new feature for managing performance of Oracle
database : the new dataset
Synchronous write bias property or
logbias for short. In a nutshell, this
property takes the default value of
Latency signifying that the
storage should handle synchronous writes in urgency, the historical
default handling. See
Brendan's
comprehensive blog entry on the Separate Intent Log and synchronous writes.
However for datasets holding Oracle
Datafiles,
the
logbias property can be set to
Throughput signifying that the
storage should avoid using log devices acceleration instead trying to
optimize the workload's throughput and efficiency. We definitely
expect to see a good boost to Oracle performance from this feature for
many types of workloads and configs; workloads that generate
10s of MB/sec of DB writer traffic and have no more than 1 logzilla per tray/JBOD.
The property is set in the
Share Properties just above
database recordsize. You might need to unset the
Inherit from
projet checkbox in order to modify the settings on a particular
share:

The
logbias property addresses a peculiar aspect of Oracle workloads :
namely that DB writers are issuing a large number of concurrent
synchronous writes to Oracle datafiles, writes which individually
are not particularly urgent. In contrast to other types of synchronous
writes workloads, the more important metrics for DB Writers is not
about individual latency. The important metric is that the storage
keep up with the
throughput demand in order to have database buffers
always available for recycling. This is unlike redo log
writes which are critically sensitive to latency as they are holding
up individual transactions and thus users.
ZFS and the ZIL
A little background; with ZFS, synchronous writes are managed by the
ZFS Intent Log
ZIL.
Because synchronous writes are typically holding up applications, it's
important to handle those writes with some level of urgency and the
ZIL does an admirable job at that.
In the Openstorage
hybrid storage pool the ZIL itself
is speeded up using low latency write-optimized SSD devices : the
logzillas. Those devices are used to commit a copy of the in-memory
ZIL transaction and retain the data until an upcoming transaction group
commits the in-memory state to the on-disk pooled storage
(
Dynamics of
ZFS,
The
New ZFS write throttle).
So while the ZIL speeds up synchronous writes, logzillas speeds up the
zil. Now SSDs can serve IOPS at a blazing 100μs but also have
their own throughput limits: currently around 110MB/sec per device.
At that throughput, committing, for example, 40K of data will need
minimally 360μs. The more data we can divert away from log devices, the lower the
latency response of those devices will be.
It's interesting to note that other types of raid controllers will be
hostage of their NVRAM and
require, for consistency, that data be
committed through some form of acceleration in order to avoid the
Raid
Write Hole (
Bonwick on Raid-Z). ZFS, however,
does require that data passes through its SSD commit accelerator and
it can manage consistency of commits either using disk
or using
SSDs.
Synchronous write bias : Throughput
With this newfound ability of storage administrators to signify to ZFS
that some datasets will be subject to highly threaded synchronous
writes for which global throughput is more critical than individual
write latency, we can enable a different handling mode. By setting
Logbias=Throughput ZFS is able to divert writes away from
the Logzillas which are then preserved for servicing low latency
sensitive operations (e.g. redo log operations).
- A setting of Synchronous write bias : Throughput for a dataset allows synchronous
writes to files in other datasets to have lower latency
access to SSD log devices.
But that's not all. Data flowing through a
logbias=Throughput
dataset is still served by the ZIL. It turns out that the ZIL has
different internal options in the way it can commit transactions one
of which being tagged WR_INDIRECT. WR_INDIRECT commits issue an
I/O for the modified file record and record a pointer to it in the zil chain.
(see WR_INDIRECT in
zil.c,
zvol.c,
zfs_log.c
).
ZIL transaction of type WR_INDIRECT might use more disk I/Os and
slightly higher latency immediately but less I/Os and less total bytes
during the upcoming transaction group update. Up to this point, the
heuristics that lead to using WR_INDIRECT transactions, were not
triggered by DB writer workloads. But armed with the knowledge that
comes with the new
logbias property, we're now less concerned
about the slight latency increase that WR_INDIRECT can have. So from
efficiency consideration the
logbias=Throughput datasets
are now set to use this mode leading to more leveled latency
distributions of Transactions.
- Synchronous write bias : Throughput is a dataset mode that reduces the number of
I/Os that need to be issued on behalf of this dataset during the regular transaction
group updates leading to more leveled response time.
A reminder that such kind of improvements sometimes can go unnoticed
in sustained benchmarks if the downstream Transaction group destage is
not given enough resources. Make sure you have enough spindles (or
total disk KRPM) to sustain the level of performance you need. A
pool with 2 logzillas and a single JBOD, might have enough SSD
throughput to absorb DB writer workloads without adversely affecting
redo log latency and so would not benefit from the special logbias
settings, however for 1 logzillas per JBOD the situation might be
reversed.
While the DB Record Size property is inherited by files in a dataset and is
immutable, the logbias setting is totally dynamic and can be
toggled on the fly during operations. For instance, during database
creation or some lightly threaded write operations to Datafiles, it's
expected that
logbias=Latency should perform better.
Logbias deployments for Oracle
As of the 2009.Q3 release of fishworks, the current wisdom around
deploying Oracle DB an Openstorage system with SSD acceleration, is to
segregate, at the filesystem/dataset level, but within the single
storage pool, Oracle datafiles, index files and redo Log files. Having
each type of files in different dataset allows better observability
into each one using the great
analytics
tool. But also, each dataset can then be tuned independantly to
deliver the most stable performance characteristics. The most
important parameter to consider is the ZFS internal recordsize used to
manage the files. For Oracle datafiles the established (
ZFS
Best Practice) is to match the recordsize and the DB block size.
For redo log files using default 128K records means that fewer file
updates will be stradling multiple filesystem records. With 128K
records we expect to have fewer transaction needing to wait for redo
log input I/Os leading more leveled latency distribution for
transactions. As for Index files, using smaller blocks of 8K offers
better cacheability feature for both the primary and
secondary caches
(only cache what is used from indexes), but using larger blocks offers
better index-scan performance. Experimenting is in order, depending on
your use case, but an intermediate block size of maybe 32K might also
be considered for mixed usage scenario.
For Oracle datafiles specifically, using the new setting of
Synchronous write bias : Throughput has potential to deliver
more stable performance in general and higher performance for redo log
sensitive workloads.
| Dataset | Recordsize | Logbias |
| Datafiles | 8K | Throughput |
| Redo Logs | 128K(default) | Latency(default) |
| Index | 8K-32K? | Latency(default) |
Following these guidelines yielded a 40% boost in our Transaction
processing testing in which we had 1 logzillas for a 40 disk pool.
Trackback URL: http://blogs.sun.com/roch/entry/synchronous_write_bias_property
Commenting on my own post, it appears that mysql over innoDB is not likely to benefit from this property because DB writer workload is often handled by regular threads handling transactions and it's done prior to returning transaction info to end user. So for mysql DB writer latency is just as critical to transaction latency as the log writes are.
Posted by Roch Bourbonnais on septembre 21, 2009 at 05:37 PM MEST #
Wouldn't you want to set the logbias to latency for index files as well, given that they also incur synchronous writes as per the data files when an indexed table is updated ?
Posted by Jeroen on septembre 23, 2009 at 06:17 AM MEST #
Sorry, I meant set logbias to throughput for index files, not latency.
Posted by JEROEN WILMS on septembre 23, 2009 at 06:18 AM MEST #
Silly question: Is this something that can be turned on and off at any time, so profiling of data can be done without interruption?
It seems to me that being able to tune this on the fly and watch the analytics as it happens would provide answers to the 'what impact will this have' questions.
My other question is: Would it make sense to profine the daily workload, and schedule changes to this properly when / if the workload was known to change? Or - would it more likely be a set and forget?
Cheers!
Posted by Nathan on septembre 24, 2009 at 02:44 AM MEST #
Nathan : It's dynamically setable, but I would hope to be set and forget for Oracle datafiles.
Posted by Roch Bourbonnais on octobre 14, 2009 at 02:47 AM MEST #