ZFS and the uberblock
Tuesday Jan 06, 2009
Inspired by Constantin's comment on USB sticks wearing out Matthias's blog entry about an eco-friendly home server, I tried to find out more about how and how often the ZFS uberblock is written.
Using DTrace, it's not that difficult:
We start by finding out which DTrace probes exist for the uberblock:
$ dtrace -l | grep -i uberblock 31726 fbt zfs vdev_uberblock_compare entry 31727 fbt zfs vdev_uberblock_compare return 31728 fbt zfs vdev_uberblock_load_done entry 31729 fbt zfs vdev_uberblock_load_done return 31730 fbt zfs vdev_uberblock_sync_done entry 31731 fbt zfs vdev_uberblock_sync_done return 31732 fbt zfs vdev_uberblock_sync entry 31733 fbt zfs vdev_uberblock_sync return 34304 fbt zfs vdev_uberblock_sync_list entry 34305 fbt zfs vdev_uberblock_sync_list return 34404 fbt zfs uberblock_update entry 34405 fbt zfs uberblock_update return 34408 fbt zfs uberblock_verify entry 34409 fbt zfs uberblock_verify return 34416 fbt zfs vdev_uberblock_load entry 34417 fbt zfs vdev_uberblock_load return
So there are two probes on uberblock_update: fbt:zfs:uberblock_update:entry and fbt:zfs:uberblock_update:return!
Now we can find out more about it by searching the OpenSolaris sources: When searching for definition of uberblock_update in project onnv, we find one hit for line 49 in file uberblock.c, and when clicking on it, we see:
Now, when searching again for the definitions of the first two arguments (args[0| and args[1|) of uberblock_update (which is uberblock and vdev), we get:
For uberblock, the following hits are shown:

When clicking on the link on the definition of struct uberblock (around line 53 in file uberblock_impl.h), we get:
For the members of struct vdev, it's not that easy. First, we get a long hit list when searching for the definition of vdev in the source browser. But if we search for "struct vdev" in that list, using the browser's search function, we get:

When clicking on the definition of struct vdev (around line 108 in file vdev_impl.h), we can see all the members of this structure.
Here are all the links, plus one more for struct blkprt (a member of struct uberblock), again in one place:
- Line 49 of file uberblock.c (definition of function uberblock_update())
- Line 50 of file uberblock_impl.h (definition and description of struct uberblock)
- Line 176 of file spa.h (definition and description of struct blkptr)
- Line 108 of file vdev_impl.h (definition and description of struct vdev)
Now we are prepared to access the data via DTrace, by printing the arguments and members as in the following example:
printf ("%d %d %d", args[0]->ub_timestamp, args[1]->vdev_id, args[2]);
So a sample final DTrace script to print out as much information in the event of an uberblock_update as we can, and also print out any relevant I/O (hoping that from showing both at the same time, we can see where and how often the uberblocks are written):
io:genunix:default_physio:start,
io:genunix:bdev_strategy:start,
io:genunix:biodone:done
{
printf ("%d %s %d %d", timestamp, execname,
args[0]->b_blkno, args[0]->b_bcount);
}
fbt:zfs:uberblock_update:entry
{
printf ("%d %s, %d, %d, %d, %d", timestamp, execname,
pid, args[0]->ub_rootbp.blk_prop, args[1]->vdev_asize, args[2]);
}
The lines for showing the I/O are derived from DTrace scripts for I/O analysis in the DTrace Toolkit.
Although I was unable to print out members of struct vdev (the second argument to uberblock_update() ) with the fbt:zfs:uberblock_update:entry probe (I also tried fbt:zfs:uberblock_update:return but had other problems with that one), the results when running this script, using
$ dtrace -s zfs-uberblock-report-02.d
, are quite interesting. Here's an extract (long lines shortened):
0 33280 uberblock_update:entry 102523281435514 sched, 0, 922..345, 0, 21005 0 5510 bdev_strategy:start 102523490757174 sched 282 1024 0 5510 bdev_strategy:start 102523490840779 sched 794 1024 0 5510 bdev_strategy:start 102523490873844 sched 18493722 1024 0 5510 bdev_strategy:start 102523490903928 sched 18494234 1024 0 5498 biodone:done 102523491215729 sched 282 1024 0 5498 biodone:done 102523491576878 sched 794 1024 0 5498 biodone:done 102523491873015 sched 18493722 1024 0 5498 biodone:done 102523492232464 sched 18494234 1024 ... 0 33280 uberblock_update:entry 102553280316974 sched, 0, 922..345, 0, 21006 0 5510 bdev_strategy:start 102553910907205 sched 284 1024 0 5510 bdev_strategy:start 102553910989248 sched 796 1024 0 5510 bdev_strategy:start 102553911022603 sched 18493724 1024 0 5510 bdev_strategy:start 102553911052733 sched 18494236 1024 0 5498 biodone:done 102553911344640 sched 284 1024 0 5498 biodone:done 102553911623733 sched 796 1024 0 5498 biodone:done 102553911981236 sched 18493724 1024 0 5498 biodone:done 102553912250614 sched 18494236 1024 ... 0 33280 uberblock_update:entry 102583279275573 sched, 0, 922..345, 0, 21007 0 5510 bdev_strategy:start 102583540376459 sched 286 1024 0 5510 bdev_strategy:start 102583540459265 sched 798 1024 0 5510 bdev_strategy:start 102583540492968 sched 18493726 1024 0 5510 bdev_strategy:start 102583540522840 sched 18494238 1024 0 5498 biodone:done 102583540814677 sched 286 1024 0 5498 biodone:done 102583541091636 sched 798 1024 0 5498 biodone:done 102583541406962 sched 18493726 1024 0 5498 biodone:done 102583541743494 sched 18494238 1024
Using the following (n)awk one-liners:
$ nawk '/uberblock/{print}}' zfs-ub-report-02.d.out
$ nawk '/uberblock/{a=0}{a++;if ((a==2)){print}}' zfs-ub-report-02.d.out
$ nawk '/uberblock/{a=0}{a++;if ((a>=1)&&(a<=5)){print}}' zfs-ub-report-02.d.out
, we can print:
- only the uberblock_update lines, or
- just the next line after the line that matches the uberblock_update entry, or
- all 4 lines after that entry, including the entry itself.
When running the script for a while and capturing its output, we can later analyze at which block number the first block after uberblock_update() is written, and we can see that the numbers are always even, the lowest number is 256 and the highest number is 510, with a block size of 1024. Those block numbers always go from 256, 258, 260, and so forth, until they reach 510. Then, they start with 256 again. So every (510-256)/2+1 = 128th iteration (yes, it's one more, as we have to include the first element after subtracting the first from the last element), the first block is overwritten again. The same is true for blocks 768...1022, 18493696...18493950 and 18494208...18494462 (the third and fourth block ranges should be different for different zpool sizes).
Now that we understand how and in which order the uberblocks are written, we are prepared to examine after how many days the uberblock area of a USB stick without wear leveling would probably be worn out. More on that and how we can use zdb for that, in my next blog entry.
Some more links on this topic:
- Matthew Ahrens' blog entry describes how the uberblock is updated
- Richard Elling's blog entries show static and dynamic visualization of ZFS I/O characteristics.
- Eric Schrock's blog entry on how the ZFS import works
- In this email thread, Nathan Hand posted a great guide on invalidating uberblocks to access old data referenced by previous uberblocks
- The ZFS On-Disk implementation paper contains a lot of details on this topic










