EASIER, FASTER, SAFER...for Beginners and Experts on Sun ZFS, Java, Solaris, VirtualBox for ISV and Partners

The new ZFS write throttle feature, which integrated in Nevada build 87, specifically addresses write intensive workloads. Today, we take a closer look at the write throttle in action. Our test system is a Sun Fire X4500 running Nevada build 94 with a single ZFS pool of 42 striped disks.

blog@x4500> zpool list

NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT

h     19.0T   620K  19.0T     0%  ONLINE  -
  

The zfs_write_throttle.d DTrace script is used to observe the write throttle. In a first test, we start generating write I/O load using a couple of “dd if=/dev/zero of=/h/<file> bs=1024k” commands. Here's an extract of the script output:

--- 2008 Jul 28 14:04:17

                                                      Sync rate (/s)

  h                                                                1



                                                                MB/s

  h                                                             1540



                                                            Delays/s

  h                                                               47





  h                                                   Sync time (ms)                                    

           value  ------------- Distribution ------------- count    

              80 |                                         0        

             100 |@@@@@@@@@@@                              3        

             120 |@@@@                                     1        

...snip...
             260 |@@@@                                     1        

...snip...
             580 |@@@@                                     1        

...snip...
             780 |@@@@                                     1        

...snip...
            1320 |@@@@@@@                                  2        

            1340 |                                         0        

            1360 |@@@@                                     1        

...snip...
            1520 |@@@@                                     1        

            1540 |                                         0        





  h                                                   Written (MB)                                      

           value  ------------- Distribution ------------- count    

           < 200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        9        

...snip...
            3000 |@@@@                                     1        

...snip...
         >= 4000 |@@@@                                     1        





  h                                                   Write limit (MB)                                  

           value  ------------- Distribution ------------- count    

            7750 |                                         0        

         >= 8000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 11       


    

The output has been shortened for clarity. With the default settings in place, one can observe that the average time for synchronizing data to disks takes well over a second [range 100 ms to 1540 ms] (please refer to Sync time distribution).

In a second test, we are reducing the target time for synchronizing data on disk from five seconds (default) to one second (using the zfs_txg_synctime variable). Here's again an extract of the script output:

--- 2008 Jul 28 14:08:27

                                                      Sync rate (/s)

  h                                                                1



                                                                MB/s

  h                                                             1681



                                                            Delays/s

  h                                                               56





  h                                                   Sync time (ms)                                    

           value  ------------- Distribution ------------- count    

             340 |                                         0        

             360 |@@@                                      1        

...snip...
             460 |@@@                                      1        

             480 |                                         0        

             500 |@@@                                      1        

...snip...
             600 |@@@                                      1        

...snip...
             660 |@@@                                      1        

...snip...
             740 |@@@                                      1        

             760 |@@@                                      1        

             780 |@@@                                      1        

             800 |                                         0        

             820 |@@@                                      1        

             840 |@@@@@@                                   2        

             860 |@@@@@@                                   2        

...snip...
            1040 |@@@                                      1        

            1060 |                                         0        





  h                                                   Written (MB)                                      

           value  ------------- Distribution ------------- count    

           < 200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            10       

...snip...
            2400 |@@@                                      1        

            2600 |                                         0        

            2800 |@@@                                      1        

...snip...
         >= 4000 |@@@@@@                                   2        





  h                                                   Write limit (MB)                                  

           value  ------------- Distribution ------------- count    

            2500 |                                         0        

            2750 |@@@@@@                                   2        

...snip...
            4750 |@@@                                      1        

            5000 |@@@@@@                                   2        

            5250 |                                         0        

            5500 |@@@@@@@@@@@                              4        

...snip...
            6500 |@@@                                      1        

            6750 |                                         0        

            7000 |@@@@@@@@@                                3        

...snip...
         >= 8000 |@@@                                      1        



    


Two things can be seen when comparing with the first test:

a) the average time for synchronizing data to disks has gone down [range 360 ms to 1060 ms].

b) the pool “write limit” mark did move around over time (please refer to Write limit distribution), thus dynamically throttling the incoming application write rate to the available I/O bandwidth.

More parameters are available for tuning (please see the source code), but as usual, use them with caution. To wrap-up, here's one last output extract where the parameter zfs_write_limit_override was set to 800 MB. In setting this parameter, we are enforcing the write limit to the value specified. This can be beneficial for applications that generate a continuous well paced write stream but are sensitive to write delays.

--- 2008 Jul 28 14:54:49

                                                      Sync rate (/s)

  h                                                                4



                                                                MB/s

  h                                                              677



                                                            Delays/s

  h                                                                1





  h                                                   Sync time (ms)                                    

           value  ------------- Distribution ------------- count    

             120 |                                         0        

             140 |@@@@@@                                   6        

             160 |@@@@@@@@@@@@@@@                          15       

             180 |@@@@@@@@@@@@@@@                          15       

             200 |@@@@@                                    5        

             220 |                                         0        





  h                                                   Written (MB)                                      

           value  ------------- Distribution ------------- count    

           < 200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           31       

             200 |                                         0        

             400 |@@@@@                                    5        

             600 |                                         0        

             800 |@@@@@                                    5        

            1000 |                                         0        





  h                                                   Write limit (MB)                                  

           value  ------------- Distribution ------------- count    

            1250 |                                         0        

            1500 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 41       

            1750 |                                         0        

Hopefully, you have enjoyed these little observations!


 
    


Comments:

Post a Comment:
  • HTML Syntax: NOT allowed