Wednesday January 04, 2006 | Saurabh Mishra's Weblog |
|
|
|
All
|
Biking
|
General
|
Photographs
|
Solaris Operating System
|
Stock Market
|
Trekking & Mountaineering
VFS/Vnode Layer in Solaris In past I have mostly written on dispatcher locks (thread locks), scheduler, signal, procfs. This is for the first time, I'm writing about filesystem. I hope it'll help you in increasing awareness on filesystem so that developing filesystem specific things on Solaris is made easy.
f_bsize // block size f_frsize // block size. UFS has fragment size to accomodate small files. f_blocks // total number of blocks in the filesystem f_bfree // free blocks f_files = (fsfilcnt64_t)-1; f_ffree = (fsfilcnt64_t)-1; f_favail = (fsfilcnt64_t)-1; f_fsid // filesystem id (void) strcpy(sp->f_basetype, vfssw[vfsp->vfs_fstype].vsw_name); // name f_flag = vf_to_stf(vfsp->vfs_flag); // flag f_namemax // MAX filename size. (d) sync operation : For read-only filesystem, we don't need to implement sync. Otherwise, it's used for flushing dirty pages in the filesystem. (e) root operation : used by filesystem lookups to determine the root (or mount point). We are required to hold the vnode. Vnode layer exports following operations. We will focus on operations which are required to support read operations on the filesystem. Write operations are very tricky as you need to implement host of other operations and locking the filesystem. (a) read : This operation is invoked whether read(2) is called. In this routine, we use segmap to read the data of the file. We force fault the pages using segmap_getmapflt(segkmap, vp, (off + mapon), and then uiomove is called to copy back to userland. We also release the smp (segmap entry) using segmap_release() once uiomove() is done. Please note that segmap uses 8192 (MAXBSIZE), so according you're required to manage the offset (off) and mapon which are calculated as : off = uoff & (offset_t)MAXBMASK; mapon = (u_offset_t)(uoff & (offset_t)MAXBOFFSET); (b) getattr : In this operation, we need to return 'vattr' struture. 'ls -l' read this struture. Following members are relvant here :- va_type // type of vnode va_mode // mode va_uid // uid va_gid // gid va_atime.tv_sec // access time va_mtime.tv_sec // modification time va_ctime.tv_sec // creation time va_size // size va_nlink // link count va_blksize // block size va_nblocks // number of blocks (c) lookup : This is the heart of any filesystem. We must provide lookup in the filesystem before we can read files or seach in a directory. This routine understands the filesystem structure. In this operation, you can also use DNLC (Directory name lookup cache) to enhance the fs lookup. The Vnode and name will be cached and we don't to go to the disk all the time to search for a file/directory. dnlc_enter() can be used to put an entry in DNLC and dnlc_lookup() can be used to search whether vnode can be found in DNLC given the name. Both the routines increment v_count using VN_HOLD(). (d) getpage_miss/getpage : This routine will read the block of a file given the offset. Here we need to setup the page using page_create_va() and prepare for reading the block data using pageio_setup(). In order to issue the IO, we do following things in order -- bdev_strategy(), biowait() and then pageio_done(). In order to support read-ahead, we can use pvn_read_kluster() routines. Filesystem specific getpage() routine will call getpage_miss() to read the block. In getpage(), we also do page_lookup() in order to save going to disk if page is already there in memory. (e) readdir : This operation is used to read the directory entries. uio_offset passed in uio struture is the key thing here. If uio_offset is same as the filesize, then we have read all the directory entries. If that's not the case, then we read directory entries starting from the last offset which is passed to us in uio_offset. At the end, we are required to return the new offset in uio_offset, so that next time when readdir() is call again, we can read more directory entries. There are host of other functions which are required when write is also supported on the filesystem. For instance putpage, write etc. In order to support mmap(), we need to use segvn segment driver instead of segmap. (2006-01-04 19:30:00.0) Permalink Comments [2] |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||