Rajendra's Blog

Thursday Mar 13, 2008

Zip File System Provider Implementation details

As mentioned in my previous blog entry I will go into the implementation details of Zip file system provider. This will give you clear insight of developing a file system using new Java file system APIs ( NIO.2 ). A Zip provider allows the contents of a ZIP or JAR file to be viewed as a file system. It implements all the interfaces and abstract classes of the new file system and enables access to archive files such as ZIP or JAR, as if they are directories. It is a read-only file system.

Let me begin with description of NIO.2. The new file system NIO.2 has a feature of extensions facility ( through service provider interface ), using which one can develop a file system of their choice by providing concrete implementation to the abstract class java.nio.file.spi.FileSystemProvider. Package all the implementation classes in Java Archive (JAR) file and install ( make available the Jar file in Java class path or copy the Jar file in to JAVA_HOME/jre/lib/ext directory ) it. That's it, your file system is ready, now you can use JSR-203 New File System API's to access your file system. The new API's has many utility methods like java.nio.file.Files.walkFileTree(FileRef start, FileVisitor visitor) using which you can traverse your file system's file tree.

Abstract overview of FileSystemProvider:

File system provider creates file systems using one of the factory methods. A provider can create any number of file systems. For example each archive file is viewed as a file system, provider returns file system instances for each Zip/Jar file. In case of native file systems ( Windows, Solaris, etc. ) provider creates single instance of file system. A file system provides access to file system objects. A file system object can be a file store, file or directory. A file store is nothing but a volume or partition in which files are stored. For example, in Windows platform c:,d:, in Solaris /(root), mounted directories (/home/user) are the file stores.

Now lets go over the details of Read-Only Zip Provider:

Let me begin with a brief overview of file system. A file system is formally defined as methods and data structures that an operating system uses to keep track of files on the disk. Data structures are maintained in memory and disk. Operating system provides access methods to access the file system objects. Any container which has organized homogeneous elements can be treated as a file system, provided it has access methods to read these data structures and present information meaningful to the user. Zip/Jar file is one such container. To access a zip file all we need is to define access methods to retrieve/update the properties of file system objects. JSR-203 provides the necessary interfaces and abstract classes i.e. a framework to build any kind of a file system. We just have to provide implementation for these interfaces and abstract classes. NIO.2 comes with default provider which accesses the file system of the host operating system.

File systems can be constructed using one of the factory methods in java.nio.file.FileSystems class. If any of the factory methods is invoked, it iterates over the installed file system providers invoking either the getScheme() (if FileSystems#newFileSystem(URI uri, ...) factory method is used) or newFileSystem( FileRef file, Map<String,?> env ) ( if FileSystems#newFileSystem( FileRef file, Map<String,?> env, ClassLoader loader ) factory method is used ) until either the getScheme() on the provider returns a scheme that matches with the given uri scheme or the newFileSystem(FileRef file, Map<String, ?> env) returns a file system otherwise ProviderNotFoundException is thrown. If the schemes match, then newFileSystem(URI uri, Map<String, ?> env) method will be called on the provider.

Keeping in mind these details let's start the implementation of FileSystemProvider. Name the sub class of FileSystemProvider as ZipFileSystemProvider. Since JSR-203 implementation uses java.util.ServiceLoader for loading the providers and to create an instance of this provider, it is required to have a zero-argument constructor. So, whenever you subclass FileSystemProvider, don't forget to provide a no-argument constructor. FileSystemProvider has two factory methods for creating file systems.

>> #newFileSystem(URI uri, Map<String, ?> env) is the abstract method which has to be implemented and it creates a file system identified by the uri.

>> The other method is #newFileSystem(FileRef fref, Map<String, ?> env). This method creates file system for an existing file where the file is a container of one or more files. Since Zip File is a container file and it contains many files, we have to provide implementation for this method in ZipFileSystem.

Zip File System is represented by a URI of the form ZIP:///path#pathInZip. The scheme “ZIP” is case insensitive, 'path' locates the zip file in the underlying file system and 'pathInZip' is the path which locates the file inside a Zip/Jar file. For example in uri “ZIP:///c:/java/src.zip#/java/nio/file” , “/java/nio/file” is the path to the entry inside zip “c:/java/src.zip”. All the paths in zip file are relative. So, we can treat Zip File System as a single hierarchical file system where all the paths start with a root “/”. For example, in foo.jar file, paths META-INF/MANIFEST.MF or /META-INF/MANIFEST.MF locates the file MANIFEST.MF. All relative paths are resolved against root “/”. It is a case sensitive file system meaning file paths such as /a/a.file /A/A.file are treated as different files.

In order for this provider to be recognized and loaded when one of the factory methods in java.nio.file.FileSystems is called, we should create a file with name as “java.nio.file.spi.FileSystemProvider” in META-INF/services directory. This file should contain fully qualified name of the class that implements the abstract class FileSystemProvider. So in this case it should be “com.sun.nio.zipfs.ZipFileSystemProvider“.

Since this Zip Provider creates many ZipFileSystems, let us create a table Map<URI, ZipFileSystem> to cache all the created file systems. Here URI is the key and it should not have any fragment component. So URIs of the form “ZIP:///path” is the key that identifies the zip file system.

ZipFileSystemPrvoder#getScheme() returns “ZIP” since the URI that identifies Zip File system should have scheme as “ZIP”.

ZipFileSystemProvider#getPath(URI uri) method searches the cache for the zip file system identified by the given uri. If the scheme of the given uri is not “ZIP” it throws an IllegalArgumentException. If the file system corresponding to the given uri is not available in the cache then it throws FileSystemNotFoundException. If the file system is found in the cache, then method getPath(uri.fragment()) will called on that file system.

ZipFileSystemProvider#newFileSystem(FileRef fref, Map<String, ?> env) is convenient method for creating a file system from an existing file. This method throws an UnsupportedOperationException, if the argument 'fref' does not refer to a path created by the default provider or if the given file is not Zip/Jar file. If the file does not exist, then it throws NoSuchFileException. Otherwise it returns an instance of ZipFileSystem for the given Zip/Jar file. I will discuss how to implement java.nio.file.FileSystem abstract class later part of this blog. One important point to remember here is, we do not cache the file systems created by this method.

ZipFileSystemProvider#newFileSystem(URI uri,Map<String, ?> env) is another method for creating a file system from a uri. The scheme of the given uri should be “ZIP”, otherwise an IllegalArgumentException is thrown. The 'path' component of the uri locates a Zip file in underlying file system, and if the file does not exist NoSuchFileException is thrown. If the file does not end with “.zip” or '.jar” extension then it throws an InvalidPathException. The file systems could be configured using the 'env' argument. User can set the default directory by creating a map with key “default.dir” and a string value which is either home or default directory so that all relative paths will be resolved to this directory. The default directory should be absolute path. If user does not specify default directory then all the relative paths are resolved against root “/”. This method caches the created file system for further reference.

ZipFileSystemProvider#getFileSystem(URI uri) gets the Zip File System instance identified by the given uri from the cache. The key should not contain any fragments. If the given URI contains a fragment component then it is discarded and a new uri is created with the scheme and path. If there is no file system in the cache corresponding to this uri, then FileSystemNotFoundExceptin is thrown. If the scheme of the given uri is not “ZIP” then IllegalArgumentException is thrown.

ZipFileSystemProvider#newFileChannel(Path path, Set<? extends OpenOption> options,Attribute<?>... attrs ) method returns a readable file channel for the given file. This method just converts the given path to string and creates path from default provider and invokes newFileChannel() method on the path. This method should be implemented as FileChennel.open() internally calls the provider's implementation.

Now we will see the implementation of abstract class java.nio.file.FileSystem and name the subclass as ZipFileSystem. Zip file system is read-only, case sensitive and closeable file system. Zip file system like unix, has a single hierarchical directory structure. Each path in zip file system starts with root “/”. The Zip file separator is a forward slash “/”. When you create a Zip or Jar file all the root components (/ in Unix,c:,d: in Windows etc.) in the path are discarded. For example if we create a zip file for all the files in the folder C:\Downloads\* then all the zip file names contain relative paths like Downloads\*. So zip file system resolves all relative paths to root “/” unless environment property “default.dir” is set while creating the file system. If this property is set, then all the relative paths are resolved to the default directory.

This implementation overrides all the the methods with reasonable behavior except the optional FileSystem#newWatchService() method. This method throws UnsupportedOperationException. In other words, Zip file system does not recognize file system notification events.

Zip file system is opened on creation and when closed, closes all the opened Streams and Channels that are opened by this file system and it also removes this file system from the cache which is maintained by the provider. Any attempt to access a closed file system will throw ClosedFileSystemException.

Any operation which access the zip file system acquires read lock and checks whether file system is opened, if closed then throws ClosedFileSystemException, and at the end of operation it releases the read lock. The close method acquires exclusive write lock and releases all the opened resources.

Zip file system provides views to read the file system object properties and methods to get file system objects. One such method is getPath().

ZipFileSystem#getPath() method converts the given string to ZipFilePath object. This method throws an InvalidPathException, if the given string is empty and contains any character that is not valid for naming a file in the underlying file system. If the string contains any backward slashes then they are replaced by Zip file separator ie. “/”.

Zip file system supports three attribute views. A attribute view provides a view into the properties of file system objects. You can get the view of your interest by passing class object of that attribute view to the method FileSystem#newFileAttributeView(Class<?> viewType). The returned attribute view is unbounded. You could bind this attribute view to the file object whose properties needs to be viewed. The attribute views supported by the ZipFileSystem are ZipBasicFileAttributeView, ZipFileAttributeView, and JarFileAttributeView. Description of attribute views are discussed later in this blog.

ZipFileSystem#newFileAttributeView(Class<?> viewType ) returns a requested attribute view. If the requested view type is BasicFileAttributeView, ZipFileAttributeView or JarFileAttributeView then instance of ZipFileBasicAttributeView, ZipFileAttributeView or JarFileAttributeView is returned respectively.

ZipFileSystem#getFileNameMatcher(String syntaxAndPattern) method does the same thing as defined in specification but incase of glob filters it filters the file name against the given case sensitive glob.

Now we will see the implementation of java.nio.file.FileStore. A file store is one which represents storage for files. Since Zip file system has a single hierarchical directory structure on root / and this root is an abstraction for the ZIP or JAR file, the method ZipFileSystem#getFileStores() returns only one file store that gives details of this ZIP or JAR file. ZipFileStore#name() returns the complete path of the ZIP or JAR file, ZipFileStore#type() returns “zipfs” and ZipFileStore#isReadOnly() always returns “true” since this is a read only file system.

ZipFileStore#getFileStoreAttributeViews() returns a list that contains an element ZipFileStoreAttributeView which implements java.nio.file.attribute.FileStoreSpaceAttributeView, so this attribute view is bound to return disk space information for this file store. Since the file store is the ZIP or JAR file, only FileStoreSpaceAttributes#totalSpace() returns the size of the archive file. The other two methods FileStoreSpaceAttributes#{usableSpace, unallocatedSpace}() returns zero since they are not relevant to ZipFileSystem. Another point to be noted about ZipFileStoreAttributeView is , it should provide implementation for the method readAttributes() which returns sub class of FileStoreSpaceAttributes, since it implements FileStoreSpaceAttributeView.

Implementation for java.nio.file.Path:

ZipFilePath extends the abstract class java.nio.file.Path, which locates the file or directory in ZIP or JAR. A path in Zip can be relative or absolute. A relative path is resolved against the root if “default.dir” is not provided in the environment configuration map while creating the Zip file system, otherwise it is resolved against the default directory. A path in zip is represented by byte array and it overrides all the methods which handles the components of a path. The methods ZipFilePath#{getName,getNameCount,getFileName,subPath,startsWith,endsWith,getParent}() behave the same way as mentioned in the specification of java.nio.file.Path. In addition to this methods, ZipFilePath has methods getEntryName(), subEntryPath(),getParentEntry(), getEntryName() getEntryNameCount() which deals with entry components of ZipFilePath. Important point to note here is, zip file path can expand to nested zips or jars in the file's path name. For example, /home/userA/zipfile.zip/DirA/dirB/jarFile.jar/META-INF/MANIFEST.MF accesses the jar file “jarFile.jar” inside Zip file “/home/userA/zipfile.zip”. For this path getEntryName(0) returns “/home/userA/zipfile.zip” and getEntryName(1) returns “DirA/DirB/jarFile.jar” getParentEntry() returns “/home/userA/zipfile.zip/DirA/dirB/jarFile.jar” and subEntryPath(0,1) returns “home/userA/zipFile.zip”.

Since ZipFileSystem is read only FileSystem, methods of Path copyTo(..),moveTo(..),createDirectory(..), newOutputStream(..) and delet() throws ReadOnlyFileSystemException, and createSymbolicLink(), createLink()and readSymbolicLink() throws UnsupportedOperationException.

Implementation of jav.nio.file.DirectoryStream:

ZipFileStream implements java.nio.file.DirectoryStream, so it has a iterator which iterates entries of a directory in Zip/Jar. ZipFileEntry represents each entry and it implements java.nio.file.DirectoryEntry. When DirectoryStream is opened using the method ZipFilePath#newDirectoryStream(String syntaxAndPattern), it checks for the existence of the file. If the file does not exist, then it throws NoSuchFileException and if the file is not a directory then it throws NotDirctoryException. The remove() method of the iterator that is associated with ZipDirectoryStream throws UnsupportedOperationException with a cause of ReadOnlyFileSystemException. When there are no entries, it just returns empty iterator, otherwise, it returns all the filtered entries in the directory according to given glob or regex pattern. This is similar to iterating entries in the directories in native file system.

AttributeViews and Attributes:

Now we will discuss the AttributeViews and Attributes supported by Zip file system. ZipFileSystem supports ZipFileBasicAttributeView, ZipFileAttributeView and JarFileAttributeView. These attribute views provide access to read zip file system object properties. Zip file objects can be a file or directory in a zip/jar. So ZipFileBasicAttribute.readAttributes() returns ZipFileBasicAttributes object which encapsulates the attributes such as size(), modifiedTime() etc.

ZipFileBasicAttributeView implements java.nio.file.attribute.BasicFileAttributeView and it provides implementation for the methods bind(FileRef fref), bind(FileRef,followLinks) and readAttributes().

ZipFileBasicAttributeView#bind(FileRef fref) binds this given FileRef object with this view and bind(FileRef fref, boolean followLinks ) does the same ignoring the followLinks parameter.

The binding facility makes it easier to read the attributes of file system objects. You can get one attribute view and bind it to any of file objects, and retrieve the attributes of the object using readAttributes() method. ZipFileSystem#newFileAttributeView(Class<?> viewType) returns an unbound attribute view, that can be bound to any ZipFilePath object. Rebinding de-references the object to which view was already bound to. Attribute view checks whether file system is opened, if it is closed then ClosedFileSystemException is thrown.

ZipFileBasicAttributes implements java.nio.file.attribute.BasicFileAttrbutes which encapsulates the properties of zip file system object.

ZipFileBasicAttributeView#setTimes(Long lastModifiedTime, Long lastAccessTime, Long createTime, TimeUnit unit) throws ReadOnlyFileSystemException since the ZipFileSystem is read-only.

ZipFileAttributeView extends ZipFileBasicAttributeView which provides a view to zip specific attributes like comment(), compressSize(), crc(), versionMadeBy() etc.

JarFileAttributeView allows user to read Manifest attributes and jar entry attributes.

Lets go over the details of how the Zip provider reads the contents and attributes of zip file entries.

A simple zip file has the following format.


[local file header 1]
[file data 1]
[data descriptor 1]
.
.
.
[local file header n]
[file data n]
[data descriptor n]
[central directory]
[end of central directory record]

Local file header, central directory and end of central directory record sections are identified by the signatures 0x04034b50,0x02014b50 and 0x06054b50 respectively. Total number of entries and central directory offset are available in “end of central directory record” at the offsets 10 and 16 and number of bytes being 2 and 4 respectively. The byte order of the zip file is little_endian. So you need to set byte order to be LITTLE_ENDIAN when reading the zip file data into ByteBuffer. Once we have number of entries in the zip file, we can read all the file names and attributes of the corresponding file which are available in “central directory” section. Create an object of com.sun.nio.zipfs.ZipEntryInfo or comp.sun.nio.zipfs.JarEntryInfo depending on the archive type and set all the attributes. Cache all these entries in a table. java.util.zip APIs are used only for reading the contents of an entry in zip or Jar file and java.util.jar package methods are used for reading manifest and entry level attributes.

Comments:

This post gave us a major Brainstorm session of all the possibilities we can utilize on our blog.

Posted by Jeff Paul Scam on March 05, 2009 at 02:35 AM IST #

I've been thinking of something similar since about 2005.

http://www.geocities.com/mik3hall/javasrc.html#trz

Looking forward to being at 1.7 with this in place. Interesting to me, thanks.

Posted by Michael Hall on April 23, 2009 at 05:22 AM IST #

Post a Comment:
  • HTML Syntax: NOT allowed

Calendar

Feeds

Search

Links

Navigation

Referrers