Alan Hargreaves' Weblog
The ramblings of an Australian SaND TSC* Principal Field Technologist
* Solaris and Network Domain Technology Support Centre - The group I work forTags
(update 1) acoustic bind birthday blues bugs cec cec2007 cec2008 china cmt contention cringley debugging dogs dtrace earthquake encumbered-binaries extra flash funny google guitar halloween huron install kids linux liveupgrade locking mdb music mysql newyear niagra openjava opensolaris oracle patches patents percussion performance redhat secondlife security solaris sru sun support sxcr t2 t2000 timeslider ufs upgrade virtualbox windows youtube zfs
Friday Jun 10, 2005
How do Solaris Filesystems Update Statistics Without Intimate Knowledge of the cpu Structure?
Well, opensolaris is now available. One of the nice things about this is it means that there are a lot more things that we can freely talk about.
There are a number of kstats that people who use filesystems simply assume should be updated in the cpu structure. Unfortunately, it appears that we never really advertised a method of doing this. Subsequently, some of the third party filesystems directly update them (for which we cannot really fault them).
Some time ago you may remember that we had a problem with various third party filesystems (and a few other things) breaking as a result of installing patch 108528-29 on Solaris 8. The root cause of that problem was that a new element was added into the middle of the cpu structure.
That is, the kstats changed their offset within the structure, so the packages were doing their updates to the wrong structure elements, as they had been compiled with the old definitions.
For my own interest I constructed the following write-up of how ufs does it as a result of that problem. I hope folks find it useful, if nothing else it will also give an introduction to navigating the OpenSolaris Source Browser and the bug tracking interface.
Now that the opensolaris code has been released under the CDDL licence, I can not only talk about this issue, but we can link into the source tree as well.
Kstats that filesystems use
The kstats that filesystems are expected to update form a part of the cpu_t structure. They are
cpu_stat.cpu_sysinfo.bread /* physical block reads */ cpu_stat.cpu_sysinfo.bwrite /* physical block writes (sync + async) */ cpu_stat.cpu_sysinfo.lread /* logical block reads */ cpu_stat.cpu_sysinfo.lwrite /* logical block writes */ cpu_stat.cpu_sysinfo.bawrite /* physical block writes (async) */ cpu_stat.cpu_vminfo.pgin /* pageins */ cpu_stat.cpu_vminfo.pgpgin /* pages paged in */ cpu_stat.cpu_vminfo.anonpgin /* anon pages paged in */ cpu_stat.cpu_vminfo.execpgin /* executable pages paged in */ cpu_stat.cpu_vminfo.fspgin /* fs pages paged in */ cpu_stat.cpu_vminfo.maj_fault /* major page faults */
and can be found in usr/src/uts/common/sys/sysinfo.h
Although ufs forms a part of the ON (O/S and Network) consolidation, it does not directly update the stats in the cpu structure. The updates are performed within a number of the routines that are used to do the I/O. The reason for this is that cpu is considered Contract/Private interface. This basically means that if a project wants to use the interface, a contract must exist with the interface owner. In this way, if the interface changes, we know which other modules are affected. For more information on interface stability, see attributes(5).
All of the cpu_vminfo statistics are updated from pageio_setup().
/* * Allocate and initialize a buf struct for use with pageio. */ struct buf * pageio_setup(struct page *pp, size_t len, struct vnode *vp, int flags)
In the case of (flags & B_READ), this routine will update all of the above values in cpu_vminfo as appropriate.
pgin will be incremented with each call.
pgpgin will be incremented by the number of pages required to page in len bytes.
anonpgin, execpgin and fspgin will be incremented similarly to pgpgin, based upon information found in pp->p_vnode.
maj_fault will be incremented in the case of a syncronous read (ie (flags & B_ASYNC) == 0).
cpu_sysinfo.bread and cpu_sysinfo.lread
lread is updated on every call to bread_common(). If we actually go to disk then bread is also updated.
/* * Common code for reading a buffer with various options * * Read in (if necessary) the block and return a buffer pointer. */ struct buf * bread_common(void *arg, dev_t dev, daddr_t blkno, long bsize)
breada() is similar to bread_common() except that it also triggers a read ahead on the next block.
/* * Read in the block, like bread, but also start I/O on the * read-ahead block (which is not allocated to the caller). */ struct buf * breada(dev_t dev, daddr_t blkno, daddr_t rablkno, long bsize)
cpu_sysinfo.bwrite, cpu_sysinfo.lwrite and cpu_sysinfo.bawrite
/* * Common code for writing a buffer with various options. * * force_wait - wait for write completion regardless of B_ASYNC flag * do_relse - release the buffer when we are done * clear_flags - flags to clear from the buffer */ void bwrite_common(void *arg, struct buf *bp, int force_wait, int do_relse, int clear_flags)
Each call to bwrite_common() increments both bwrite and lwrite. If we are forced to asyncronous, either by force_wait or (flag & B_ASYNC), then bawrite is also incrememented.
/* * Release the buffer, marking it so that if it is grabbed * for another purpose it will be written out before being * given up (e.g. when writing a partial block where it is * assumed that another write for the same block will soon follow). * Also save the time that the block is first marked as delayed * so that it will be written in a reasonable time. */ void bdwrite(struct buf *bp)
bdwrite() also increments lwrite each time it is called.
The future?
In November I logged RFE 6199092 which requests that these kstats be removed from the cpu structure and made a part of the DDI. This would fit quite nicely with some suggestions that we are hearing about creating filesystem statistics on a per zone basis as well as on a per cpu. We'll see how this RFE progresses.
Technorati Tags: Solaris,OpenSolaris

