Tuesday Jul 17, 2007
slog blog (or blogging on slogging)
Using chained logs (clogs?) can also lead to pool fragmentation. This is because log blocks are allocated and then freed as soon as the pool transaction group has committed. So we get a swiss cheesing effect.
Interface
zpool create <pool> <pool devices> log <log devices>Creates a pool with separate intent log device(s). If
more than one log device is specified then writes are load-balanced
between devices. It's also possible to mirror log devices. For example
a log consisting of two sets of two mirrors could be created thus:
zpool create whirl <pool devices> \
log mirror c1t8d0 c1t9d0 \
mirror c1t10d0 c1t11d0
zpool add <pool> log <log devices>
Creates a log device if it doesn't exist, or adds extra log devices if it does.
zpool replace <pool> <old device> <new device>
Replace old log device with new log
device.
zpool attach <pool> <log device> <new log device>
Attaches a new log device to an
existing log device. If the existing device is not a mirror then a 2
way mirror is created. If device is part of a two-way log mirror,
attaching new_device creates a three-way log mirror, and so on.
zpool detach pool <log device>
Detaches a log device from a mirror.
zpool status
Additionally displays the log devices
zpool iostat
Additionally shows IO statistics for
log devices.
When a slog is full or if a non mirrored log device fails then ZFS will start using chained logs within the main pool.
Performance
The performance of databases and NFS is dictated by the latency of making data stable. They need to be assured that their transactions are not lost on power or system failure. So they are heavily dependent on the speed of the intent log devices.Here's some database performance testing results:
- Test program creates 32 threads and each does 8K O_DSYNC writes randomly to a 400MB byte file.
- Test hardware was a Sun X4500 (aka thumper) with 48 x 500GB disks.
- The NVRAM is the battery backed pci Micro Memory pci1332,5425 card.
- Table values are MB/s
| Main
pool disks |
||||||
| 1 |
2 |
4 |
8 |
16 |
32 |
|
| 0 slogs |
11 |
14 |
17 |
15 |
16 |
13 |
| 1 slog |
12 |
12 |
12 |
12 |
12 |
11 |
| 2 slogs |
17 |
17 |
17 |
19 |
19 |
16 |
| 4 slogs |
17 |
16 |
15 |
15 |
16 |
16 |
| 8 slogs |
18 |
19 |
20 |
18 |
16 |
18 |
| NVRAM |
221 |
221 |
218 |
217 |
215 |
217 |
I also ran the same without write disk cache flushing
(echo zfs_nocacheflush/W 1 | mdb -kw)
Note, this should not be done on a real system unless the device cache is non-volatile.
| Main pool disks | ||||||
| 1 |
2 |
4 |
8 |
16 |
32 |
|
| 0 slogs |
33 |
83 |
123 |
136 |
142 |
143 |
| 1 slog |
45 |
46 |
44 |
45 |
45 |
46 |
| 2 slogs |
97 |
99 |
90 |
94 |
94 |
95 |
| 4 slogs |
124 |
125 |
127 |
124 |
127 |
127 |
| 8 slogs |
135 |
137 |
134 |
138 |
138 |
138 |
| NVRAM |
225 |
220 |
226 |
226 |
226 |
227 |
Note, these tables can be a bit mis-leading. If you had 2 disks you'd have a choice of 2 main pool device or 1 slog and 1 main pool device. So looking at the table you should compare the following entries:
- 2 main pool: 83MB/s
- 1 slog, 1 main pool: 45MB/s
Perf summary
For this micro-benchmark and from limited other perf testing it makes sense to only use fast devices for the slog. However, there may be some cases where using regular disks as slog disks is faster than putting the same disks in the main pool.Status/Bugs
This support was recently putback into Solaris Nevada build snv_68. Here's a list of slog bugs - fixed and to be fixed.6574298 "slog still uses main pool for dmu_sync()" - now fixed in snv_69
6574286 "removing a slog doesn't work"
6575965 "panic/thread=2a1016b5ca0: BAD TRAP: type=9 ...:" - panic when no main pool devices present - now fixed in snv_83
Posted at 03:23PM Jul 17, 2007 by perrin in ZFS | Comments[5]