Friday Sep 18, 2009
Monday Sep 14, 2009
Ever written a test-suite and wanted an ad-hoc app-server deployment (such as tomcat) that you could create (and use) on the fly, only to get rid of it once the test suite completed? Ever written applications and wanted to create an embedded database instance (such as derby) or message-queue (such as your private ActiveMQ instance) for internal use?
See rest of this blog entry here.
Saturday Sep 12, 2009
I have been thinking about an interesting problem for some time: how to generate code-distributions that are both self-contained and minimal. By "code-distribution" I mean a Java application distributed in the form of one or more JAR files containing classes and resources, of which one class is a designated “main class” (that is, it contains a “public static void main(String[])” method). By "self-contained" I mean that the code-distribution contains all the classes and resources needed by the main class to run. By "minimal" I mean that the code-distribution contains only classes that are used, directly or indirectly, by the main class.You can read the complete posting here.
Tuesday May 19, 2009
Dimitris Andreou suggested that I try the Boyer Moore string search algorithm (instead of String.search) to see how it would impact the fast grep described earlier. It turns out that while Boyer Moore searching takes much less time than String.contains, the bottleneck in the fast grep is not in the searching step. But still, considering that BoyerMoore is faster, I have incorporated it in the fast grep.
First, here is the source code of a class that performs BoyerMoore searching (initially copied from here):
package com.subhajit.util.grep; |
The original source code has been changed in the following ways:
1. The cost of initializing a "BoyerMoore" object has been moved to the constructor.
2. Several key data members have been declared final with a view to making the "match" method thread safe.
3. A new method "isPresentIn" has been added that returns whether the pattern being sought is present in the given text or not, without returning all the positions where it occurs (as does the "match" method).
4. Some variables have been renamed for easier understanding.
The Grep method has been changed to use "BoyerMoore.isPresentIn(...)" instead of String.contains(...):
// Setup the output queue containing FileContent objects |
As I said, this makes a marginal improvement in the overall execution of grep on my setup. Your mileage might vary.
Sunday May 17, 2009
Introduction
In this post, I present a fast Java "grep" application. The application presented here is not a full featured replacement for "grep": rather, it presents a stripped down "grep" application built on top of parts you can extend and configure to implement additional features.
What does the grep application presented here do? It allows you to specify a base directory (containing files and sub-directories) under which to search, a "file name" term indicating the files (lying under this base directory) to search, and a search term. Thus for example, if you wanted to search for all JAVA files lying in the base directory "c:\var\projects" containing the literal "class", you would invoke this grep as follows:
java -jar jgrep.jar -d c:\var\projects -f .java class
Note that the "-f" argument specifies a literal string with which a file name must end in order for it to be included in the search.
Functional elements
If "grepping" may be defined as the act of building a set of files located under a given directory (and sub-directories thereof), for files conforming to a given naming pattern and containing a given search term, then grepping consists of the following independent functional elements:
1. Locating all files and sub-directories under a directory.
2. Filtering out directories and files that conform to a the given file name pattern from the above list, and reading the contents of those files.
3. Searching for the search term within the files obtained in the previous step.
4. Presenting files found in the previous step to the user.
Diagrammatically, we can represent the above steps as follows:
Thinking about the above decomposition, it is apparent that these operations may overlap. In other words, a certain step does not have to wait to begin for its previous steps to complete: each step may be seen as a "producer" that produces work for its successor thread, and, at the same time, a "consumer" for work produced by its predecessor. So, it is possible to redraw the diagram as follows:
Building on the "producer consumer" terminology introduced in the previous paragraph, we can flesh out this diagram with three queues:
In the diagram shown above, the little trapeziods represent queues, with the base (the longer parallel side) representing the producer and the top representing the consumer. Thus, the left most trapeziod represents a queue in which items are produced by the directory scanner and consumed by the file reader. Similarly, the second trapezoid represents a queue in which items are produced by the file reader and consumed by the content finder. The right most trapezoid, similarly, represents a queue in which items are produced by the content finder and consumed by the presentation component.
Code
The code presented below consists of four classes:
1. DirectoryScannerProducer scans a directory using the "fast directory scanning" technique presented in a previous posting, filling a blocking queue with directory and file entries.
2. The IProcessor class is a generic class that declares a generic "process" method that accepts an input of a given generic type, and returning a value of another given generic type. Instances of this class must be "stateless" (that is, they must not store state in instance variables).
3. QueueProcessor is a generic class that consumes items from a blocking queue containing items of a given type, processing these using a pre-defined IProcessor, and filling another blocking queue with the result. The QueueProcessor fills only non-null values returned by the IProcessor into the output queue. Any exceptions thrown by the "process" method of the IProcessor are ignored. QueueProcessor has a "waitFor" method that indicates that the QueueProcessor must stop after the given number of inputs have been processed.
4. Grep provides a main method that declares and sets up all the components and the queues, parses command line arguments, and launches all the components.
IProcessor
package com.subhajit.util.grep;
import java.lang.reflect.InvocationTargetException;
public interface IProcessor<I,O> {
O process(I input) throws InterruptedException, InvocationTargetException;
}
DirectoryScannerProducer
package com.subhajit.util.grep;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.Semaphore;
import java.util.concurrent.atomic.AtomicInteger;
/**
* Scans a directory and puts all files (not directories) into a given
* {@link BlockingQueue}.
*
* @author sdasgupta
*
*/
public class DirectoryScannerProducer {
private final Semaphore sem = new Semaphore(1);
private final AtomicInteger waitingCount = new AtomicInteger(0);
private final List<ThreadWithList> threads = new ArrayList<ThreadWithList>();
private final int installmentSize;
private final AtomicInteger producedCount = new AtomicInteger(0);
private static class DirInfo {
private final File dir;
private final File[] listing;
private int index;
public DirInfo(File dir) {
super();
this.dir = dir;
this.listing = dir.listFiles();
this.index = 0;
}
public int getIndex() {
return index;
}
public void setIndex(int index) {
this.index = index;
}
public File getDir() {
return dir;
}
public File[] getListing() {
return listing;
}
}
private final BlockingQueue<DirInfo> workingQueue;
private final BlockingQueue<File> producedQueue;
private final class ThreadWithList extends Thread {
private int useCount = 0;
public ThreadWithList() {
super();
}
public void incrementUseCount() {
useCount++;
}
public int getUseCount() {
return useCount;
}
public void run() {
while (true) {
if (DirectoryScannerProducer.this.scan0()) {
break;
}
}
}
}
public DirectoryScannerProducer(BlockingQueue<File> producedQueue,
int threads, int installmentSize) {
super();
this.producedQueue = producedQueue;
this.installmentSize = installmentSize;
workingQueue = new LinkedBlockingQueue<DirInfo>();
for (int i = 0; i < threads; i++) {
ThreadWithList t = new ThreadWithList();
this.threads.add(t);
}
}
public int scan(final File dir)
throws InterruptedException, ExecutionException {
sem.acquire();
workingQueue.add(new DirInfo(dir));
for (ThreadWithList t : threads) {
t.start();
}
sem.acquire();
return producedCount.get();
}
private boolean scan0() {
waitingCount.incrementAndGet();
// Remove the next item from the queue.
ThreadWithList thread = ((ThreadWithList) Thread.currentThread());
DirInfo dirInfo = null;
try {
if (waitingCount.get() == threads.size() && workingQueue.isEmpty()) {
sem.release();
return true;
}
dirInfo = workingQueue.take();
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return true;
} finally {
waitingCount.decrementAndGet();
}
int index = dirInfo.getIndex();
File[] listing = dirInfo.getListing();
int upperBound = Math.min(index + installmentSize, listing.length);
for (int i = index; i < upperBound; i++) {
if (listing[i].isFile()) {
try {
producedQueue.put(listing[i]);
producedCount.incrementAndGet();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return true;
}
}
if (listing[i].isDirectory()) {
DirInfo subdirInfo = new DirInfo(listing[i]);
try {
workingQueue.put(subdirInfo);
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return true;
}
}
}
if (upperBound != listing.length) {
dirInfo.setIndex(upperBound);
workingQueue.add(dirInfo);
}
thread.useCount += (upperBound - index);
return false;
}
public void close() {
for (ThreadWithList t : threads) {
if (t.isAlive()) {
t.interrupt();
}
}
for (ThreadWithList t : threads) {
try {
t.join();
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return;
}
}
workingQueue.clear();
}
}
QueueProcessor
package com.subhajit.util.grep;
import java.lang.reflect.InvocationTargetException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
/**
* Abstract class models a set of threads consuming inputs of type <tt>I</tt>
* from an input queue, and writing outputs of type <tt>O</tt> to an output
* queue.
*
* @author sdasgupta
*
* @param <I>
* @param <O>
*/
public class QueueProcessor<I, O> {
/**
* Private class polls for and extracts messages from
* {@link QueueProcessor#inputQueue}, processes each message using
* {@link QueueProcessor#process(Object)} and places the resulting object
* (if it is not <tt>null</tt>) into {@link QueueProcessor#outputQueue}.
*
* @author sdasgupta
*/
private final class ConsumerRunnable implements Runnable {
public void run() {
while (true) {
try {
final I input = QueueProcessor.this.inputQueue.poll(100,
TimeUnit.MILLISECONDS);
if (input != null) {
pushService.submit(new Runnable() {
public void run() {
try {
O result = processor.process(input);
if (result != null) {
QueueProcessor.this.outputQueue
.put(result);
producedMessageCount.incrementAndGet();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
} catch (InvocationTargetException exc) {
return;
}
}
});
consumedMessageCount.incrementAndGet();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
}
}
}
}
/**
* The number of threads consuming messages from
* {@link QueueProcessor#inputQueue}.
*/
private final int threadCount;
/**
* The threads consuming messages from {@link QueueProcessor#inputQueue}.
*/
private final List<Thread> consumerThreads;
/**
* The queue from which input messages are read.
*/
private final BlockingQueue<I> inputQueue;
/**
* The queue to which output messages are written.
*/
private final BlockingQueue<O> outputQueue;
/**
* Processes inputs (pulled from the <tt>inputQueue</tt>) and pushes the
* results (obtained by calling {@link #process(Object)}) to the
* <tt>outputQueue</tt>.
*/
private final ExecutorService pushService;
/**
* Counts the total number of consumed messages.
*/
private final AtomicInteger consumedMessageCount = new AtomicInteger(0);
/**
* Counts the total number of produced messages.
*/
private final AtomicInteger producedMessageCount = new AtomicInteger(0);
private final IProcessor<I, O> processor;
/**
* Public constructor.
*
* @param inputQueue
* @param outputQueue
* @param threadCount
*/
public QueueProcessor(BlockingQueue<I> inputQueue,
BlockingQueue<O> outputQueue, int threadCount,
IProcessor<I, O> processor) {
super();
this.inputQueue = inputQueue;
this.outputQueue = outputQueue;
this.threadCount = threadCount;
this.processor = processor;
pushService = Executors.newFixedThreadPool(10);
consumerThreads = new ArrayList<Thread>();
}
// protected abstract O process(I input) throws InterruptedException,
// InvocationTargetException;
public void startup() {
// Create the consumer threads.
for (int i = 0; i < this.threadCount; i++) {
consumerThreads.add(new Thread(new ConsumerRunnable()));
}
for (Thread thread : consumerThreads) {
thread.start();
}
}
/**
* Waits for <tt>inputMessageCount</tt> messages to be processed, invokes
* {@link QueueProcessor#shutdown()}, and returns
* {@link QueueProcessor#producedMessageCount}.
*
* @param inputMessageCount
* @return
* @throws InterruptedException
*/
public int waitFor(int inputMessageCount) throws InterruptedException {
while (true) {
if (consumedMessageCount.get() >= inputMessageCount) {
break;
}
Thread.sleep(100);
}
shutdown();
return producedMessageCount.get();
}
/**
* Issues a shutdown request to the {@link QueueProcessor#pushService},
* waits for that service to shut down, interrupts the
* {@link QueueProcessor#consumerThreads}, waits for those threads to shut
* down, then returns.
*
* <p>
* All messages which have already been consumed are processed.
* </p>
*
* @throws InterruptedException
*/
public void shutdown() throws InterruptedException {
pushService.shutdown();
while (true) {
if (pushService.isTerminated()) {
break;
}
Thread.sleep(100);
}
for (Thread thread : consumerThreads) {
thread.interrupt();
}
for (Thread thread : consumerThreads) {
thread.join();
}
}
}
Grep
package com.subhajit.util.grep;
import java.io.File;
import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;
import com.subhajit.argutils.ArgumentUtils;
import com.sun.idm.svc.util.common.CommonUtils;
import com.sun.idm.svc.util.streams.FileUtils;
public class Grep {
private static class FileContent {
private final File file;
private final String content;
public FileContent(File file, String content) {
super();
this.file = file;
this.content = content;
}
public File getFile() {
return file;
}
public String getContent() {
return content;
}
}
private static enum Argument {
text, dir, pattern
}
private static Map<Argument, ?> parseCommandLineArguments(String[] args)
throws IOException {
final String argumentSpec = "d:string:true:Base directory under which to search|"
+ "f:string:true:Comma separated file name patterns to search (eg. .java,.properties,.xml)|"
+ "i:boolean:false:Ignore case|"
+ "names:boolean:false:Show file names only if \"true\", else show verbose output";
if (args.length == 0) {
System.out
.println("Usage:\nOption\t\tType\tRequired\tDescription\n"
+ ArgumentUtils.getUsage(argumentSpec));
System.exit(1);
}
Map<String, String> map = ArgumentUtils.parseArgs(args, argumentSpec);
if (!map.containsKey(CommonUtils.UNBOUND_ARGUMENT)) {
throw new IllegalArgumentException(
"Cannot continue, since search terms have not been specified.");
}
final String text = map.get(CommonUtils.UNBOUND_ARGUMENT);
final File dir = new File(map.get("d")).getCanonicalFile();
final String pattern = map.get("f");
Map<Argument, Object> ret = new HashMap<Argument, Object>();
ret.put(Argument.dir, dir);
ret.put(Argument.pattern, pattern);
ret.put(Argument.text, text);
return ret;
}
public static void main(String[] args) {
int status = 0;
long t0 = System.nanoTime();
try {
// Parse the command line arguments.
Map<Argument, ?> commandLineArguments = parseCommandLineArguments(args);
final String text = (String) commandLineArguments
.get(Argument.text);
final File dir = (File) commandLineArguments.get(Argument.dir);
final String fileNameInput = (String) commandLineArguments
.get(Argument.pattern);
// Setup the queue of files. This queue is populated by the
// directory scanner with all files and directories found under the
// base directory we are scanning.
BlockingQueue<File> fileQueue = new ArrayBlockingQueue<File>(5000);
final DirectoryScannerProducer scanner = new DirectoryScannerProducer(
fileQueue, 5, 20);
// Setup the content queue. This queue contains FileContent objects
// embodying the content of files read from the fileQueue that match
// the file name pattern we are interested in. This queue is
// populated by the fileReaders object.
final BlockingQueue<FileContent> contentQueue = new LinkedBlockingQueue<FileContent>();
final QueueProcessor<File, FileContent> fileReaders = new QueueProcessor<File, FileContent>(
fileQueue, contentQueue, 5,
new IProcessor<File, FileContent>() {
public FileContent process(File input)
throws InterruptedException,
InvocationTargetException {
try {
if (input.isDirectory()) {
return null;
}
if (!input.getName().endsWith(fileNameInput)) {
return null;
}
return new FileContent(input, new String(
FileUtils.loadFile(input)));
} catch (IOException exc) {
throw new InvocationTargetException(exc);
}
}
});
// Setup the output queue containing FileContent objects
// representing matches. The output queue is populated by the file
// finders object.
final BlockingQueue<FileContent> outputQueue = new LinkedBlockingQueue<FileContent>();
final QueueProcessor<FileContent, FileContent> fileFinders = new QueueProcessor<FileContent, FileContent>(
contentQueue, outputQueue, 5,
new IProcessor<FileContent, FileContent>() {
public FileContent process(FileContent input)
throws InterruptedException,
InvocationTargetException {
if (input.getContent().contains(text)) {
return input;
} else {
return null;
}
}
});
// Start the file readers.
fileReaders.startup();
// Start the file finders.
fileFinders.startup();
// Start the thread that dumps the matching output to the console.
final AtomicInteger dumpedCount = new AtomicInteger(0);
final ExecutorService dumpService = Executors
.newFixedThreadPool(10);
Thread dumpingThread = new Thread(new Runnable() {
public void run() {
try {
while (true) {
final FileContent matchingFileInfo = outputQueue
.take();
dumpService.submit(new Runnable() {
public void run() {
System.out.println(matchingFileInfo
.getFile());
}
});
dumpedCount.incrementAndGet();
}
} catch (InterruptedException exc) {
System.out.println("Done (" + dumpedCount.get() + ")");
Thread.currentThread().interrupt();
return;
}
}
});
dumpingThread.start();
// Start the directory scanner.
int workItemCount0 = scanner.scan(dir);
// Wait for the file readers to process its inputs.
int workItemCount1 = fileReaders.waitFor(workItemCount0);
// Wait for the file finders object to process its inputs.
int workItemCount2 = fileFinders.waitFor(workItemCount1);
// Interrupt the dumping thread once it has printed all output.
while (true) {
if (dumpedCount.get() == workItemCount2) {
dumpingThread.interrupt();
break;
}
Thread.sleep(100);
}
dumpingThread.join();
dumpService.shutdownNow();
} catch (Throwable exc) {
status = 1;
exc.printStackTrace();
} finally {
t0 = System.nanoTime() - t0;
System.out.println((t0 / 1000000) + " ms.");
System.exit(status);
}
}
}
What do you think?
|
Wednesday May 13, 2009
Dimitris Andreou commented on my previous posting on fast directory scanning by suggesting that the individual threads doing the work of listing directories were not evenly loaded. In other words, the solution presented did not represent an even partitioning of the work between the threads. The uneven distribution results from the files and directories each directory contains: the more this number, the more “work” the thread that processes the directory has to do.
To understand how to didvide the work between the threads, it is important to note that these threads do two things: obtain directory listings (via File.listFiles()) and examine each entry in the array of files returned by that method. The first method (listFiles()) is a library method which must be used as it is (which is another way to say that it cannot be subdivided into constituent parts at the application level). The other task performed by each thread, that is the examination of each directory entry, is the only candidate for partitioning.
One obvious way to perform this partitioning is to have each thread examine a fixed number of directory listing entries, instead of all of these. This fixed number, called an “installment”, is an input parameter to the directory scan. Accordingly, I wrote the following class to “fairly” load the threads. Note that the FairScanner class presented below still does not guarantee that each thread examines the same number of directory elements: all it does is increase the likelihood that this happens at runtime.
Without further ado, here is the code:
public class FairScanner {
private final Semaphore sem = new Semaphore(1);
private final AtomicInteger waitingCount = new AtomicInteger(0);
private final List<ThreadWithList> threads = new ArrayList<ThreadWithList>();
private final int installmentSize;
private static class DirInfo {
private final File dir;
private final File[] listing;
private int index;
public DirInfo(File dir) {
super();
this.dir = dir;
this.listing = dir.listFiles();
this.index = 0;
}
public int getIndex() {
return index;
}
public void setIndex(int index) {
this.index = index;
}
public File getDir() {
return dir;
}
public File[] getListing() {
return listing;
}
}
private final BlockingQueue<DirInfo> queue;
private final class ThreadWithList extends Thread {
private final List<File> files;
private int useCount = 0;
public ThreadWithList() {
super();
files = new ArrayList<File>();
}
public List<File> files() {
return this.files;
}
public void incrementUseCount() {
useCount++;
}
public int getUseCount() {
return useCount;
}
public void run() {
while (true) {
if (FairScanner.this.scan0()) {
break;
}
}
}
}
private FairScanner(int threads, int installmentSize) {
super();
this.installmentSize = installmentSize;
queue = new LinkedBlockingQueue<DirInfo>();
for (int i = 0; i < threads; i++) {
ThreadWithList t = new ThreadWithList();
this.threads.add(t);
}
}
public Collection<File> scan(final File dir) throws InterruptedException,
ExecutionException {
sem.acquire();
queue.add(new DirInfo(dir));
for (ThreadWithList t : threads) {
t.start();
}
sem.acquire();
List<File> ret = new ArrayList<File>();
for (ThreadWithList t : threads) {
ret.addAll(t.files);
}
return ret;
}
private boolean scan0() {
waitingCount.incrementAndGet();
// Remove the next item from the queue.
ThreadWithList thread = ((ThreadWithList) Thread.currentThread());
DirInfo dirInfo = null;
try {
if (waitingCount.get() == threads.size() && queue.isEmpty()) {
sem.release();
return true;
}
dirInfo = queue.take();
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return true;
} finally {
waitingCount.decrementAndGet();
}
int index = dirInfo.getIndex();
File[] listing = dirInfo.getListing();
int upperBound = Math.min(index + installmentSize, listing.length);
for (int i = index; i < upperBound; i++) {
thread.files.add(listing[i]);
if (listing[i].isDirectory()) {
DirInfo subdirInfo = new DirInfo(listing[i]);
try {
queue.put(subdirInfo);
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return true;
}
}
}
if (upperBound != listing.length) {
dirInfo.setIndex(upperBound);
queue.add(dirInfo);
}
thread.useCount += (upperBound - index);
return false;
}
public void close() {
for (ThreadWithList t : threads) {
if (t.isAlive()) {
t.interrupt();
}
}
for (ThreadWithList t : threads) {
try {
t.files.clear();
t.join();
System.out.print(t.getUseCount() + " ");
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return;
}
}
System.out.println();
queue.clear();
}
private static final List<FairScanner> scanners = new ArrayList<FairScanner>();
public static Collection<File> listAllContentsUnder(File dir, int threads,
int installmentSize) throws InterruptedException,
ExecutionException {
FairScanner scanner = new FairScanner(threads, installmentSize);
scanners.add(scanner);
return scanner.scan(dir);
}
public static void flush() {
for (FairScanner scanner : scanners) {
scanner.close();
}
}
}
Note the following items about this class:
1. There is a bounded queue (named “queue”) that holds “DirInfo” objects. These objects contain information about a directory such as a reference to the directory, an array of files representing the directory listing, and an “index” into this array representing the next item to be examined.
2. There is a custom Thread (called “ThreadWithList”) that endlessly loops and calls “scan0”.
3. The “scan0” method (which is endlessly called by instances of ThreadWithList), takes a DirInfo object from the queue, examines the next so many elements of the list of files, adds entries to the list of files, and adds new DirInfo objects to the queue for those entries that happen to be directories.
4. The scan is deemed to be complete when all the threads are waiting to take an element from the queue, and the queue is empty.
5. There are two arguments that can be used to fine tune the behavior of the FairScanner: the “threads” argument specifies the number of threads to use, and the “installmentSize” argument specifies the number of directory listing elements each thread examines at a time.
I have changed the “main” method used to test these classes as follows:
public static void main(String[] args) {
int status = 0;
try {
Set<File> files0 = new HashSet<File>();
Collection<File> files1 = null;
for (int threads = 5; threads < 20; threads++) {
for (int installmentSize = 15; installmentSize < 20; installmentSize++) {
int iterations = 5;
long t0 = System.nanoTime();
for (int i = 0; i < iterations; i++) {
files0 = SerialDirScanner
.listAllContentsUnder(new File(args[0]));
}
t0 = System.nanoTime() - t0;
long ticks0 = (t0 / 10);
System.out.println((t0 / 10) + " ticks.");
t0 = 0;
for (int i = 0; i < iterations; i++) {
long t1 = System.nanoTime();
files1 = FairScanner.listAllContentsUnder(new File(
args[0]), threads, installmentSize);
t1 = System.nanoTime() - t1;
t0 += t1;
FairScanner.flush();
}
System.out.println("Threads = " + threads
+ ",InstallmentSize = " + installmentSize);
long ticks1 = (t0 / 10);
System.out.println((t0 / 10) + " ticks.");
System.out.println(files0.size() + "," + files1.size());
System.out.println(((ticks1 * 100) / ticks0) + " %");
System.out.println();
}
}
} catch (Throwable exc) {
status = 1;
exc.printStackTrace();
} finally {
System.exit(status);
}
}
Observations
Some general observations made so far:
1. Increasing the installment size to a size of fifteen to twenty results in the best performance.
2. A thread count of about ten seems to result in the best performance.
I will post detailed data once I collect it.
Finally, note that this is a work in progress, so please do suggest improvements, shortcomings, etc.. Thanks to Josef and Dimitris for suggesting improvements.
Tuesday May 12, 2009
After posting the "DirScanner" class for fast directory scanning this morning (see my earlier post), some possible improvements to DirScanner occured to me.
One "low hanging fruit" was the common concurrent map shared between all the threads in DirScanner, and its potential to serve as a "point of contention". Note that since all threads write to this map, it is (internally, and partially) locked each time it is updated. The concurrent hash map uses lock striping to alleviate contention, but does not eliminate it altogether.
In the improved version shown below, each thread (of the fixed thread pool) is a custom thread (ThreadWithList) containing a list of files. (Josef, a reader, also mentioned this in his comment to my earlier posting). The "scan0" method, which is called within a thread, can thus safely add to the particular thread's list of files without any synchronization.
There is an instance method named "listOfLists", and each ThreadWithList adds a reference to its instance member "list" to listOfLists.
Once all the work is done, the "scan" method simply iterates through the list and accumulates and returns all of the entries in each list.
Ok, so here is the source code of the DirScanner3 class:
public class DirScanner3 {
private final ExecutorService executor;
private final Semaphore sem = new Semaphore(1);
private final AtomicInteger count = new AtomicInteger();
private final List<List<File>> listOfLists = new ArrayList<List<File>>();
/**
* Custom {@link Thread} returned by the {@link ThreadFactory} used by the
* {@link ExecutorService} data member <tt>executor</tt>.
*/
private final class ThreadWithList extends Thread {
private final List<File> files;
public ThreadWithList(Runnable r) {
super(r);
files = new ArrayList<File>();
DirScanner3.this.listOfLists.add(files);
}
public List<File> files() {
return this.files;
}
}
private DirScanner3(int threads) {
super();
executor = Executors.newFixedThreadPool(threads, new ThreadFactory() {
public Thread newThread(Runnable r) {
return new ThreadWithList(r);
}
});
}
public Collection<File> scan(final File dir) throws InterruptedException,
ExecutionException {
sem.acquire();
executor.submit(new Runnable() {
public void run() {
scan0(dir);
}
});
sem.acquire();
List<File> ret = new ArrayList<File>();
for (List<File> files : listOfLists) {
ret.addAll(files);
}
return ret;
}
private void scan0(File dir) {
for (final File file : dir.listFiles()) {
((ThreadWithList) Thread.currentThread()).files.add(file);
if (file.isDirectory()) {
count.incrementAndGet();
executor.submit(new Runnable() {
public void run() {
DirScanner3.this.scan0(file);
}
});
}
}
if (count.decrementAndGet() < 0) {
sem.release();
}
}
public void close() {
executor.shutdown();
}
public static Collection<File> listAllContentsUnder(File dir, int threads)
throws InterruptedException, ExecutionException {
DirScanner3 scanner = new DirScanner3(threads);
try {
return scanner.scan(dir);
} finally {
scanner.close();
}
}
}
It is important to note that DirScanner3 cannot be instantiated. Instead, it must be used via the listAllContentsUnder static method. I am doing this to save time clearing each element of listOfLists, and resetting the counter to zero, in the “scan” method. Perhaps this is a bad idea, and I will have to think about this some more.
In the meantime, please enjoy and keep the comments coming.
In this post, I present two approaches to directory scanning, one serial and one parallel. Both approaches return a collection of all files and directories contained by a given directory, including the contents (recursively) of each sub-directory thereof. For example, scanning the following directory tree:
…/dir/
…/dir/picture0.jpg
…/dir/picture1.jpg
…/dir/vacation-pictures/
…dir/vacation-pictures/grand-canyon1.jpg
returns a collection containing four entries, viz. …/dir/picture0.jpg, …/dir/picture1.jpg, …/dir/vacation-pictures/ and …/dir/vacation-pictures/grand-canyon1.jpg.
Serial Scanning
In serial scanning, a recursive method lists all files in a given directory, adds these to a list, and, if the file being added is a directory, invokes itself: The SerialDirScanner class, shown below, performs this task:
public class SerialDirScanner {
private final ConcurrentMap<File, Boolean> files = new ConcurrentHashMap<File, Boolean>();
public SerialDirScanner() {
super();
}
public Set<File> scan(File dir) throws InterruptedException,
ExecutionException {
scan0(dir);
return files.keySet();
}
private void scan0(File dir) {
for (final File file : dir.listFiles()) {
files.putIfAbsent(file, Boolean.TRUE);
if (file.isDirectory()) {
scan0(file);
}
}
}
public static Set<File> listAllContentsUnder(File dir)
throws InterruptedException, ExecutionException {
SerialDirScanner scanner = new SerialDirScanner();
return scanner.scan(dir);
}
}
The “scan” method invokes the recursive “scan0” method, which does the real work of scanning directories. That method fills up a concurrent map with files and directories as it works. Finally, the “scan” method returns all keys of the concurrent map.
Parallel scanning
Looking at the “scan0” method of the serial scanner shown above, many opportunities for parallelization become apparent. For example, the listing of different sibling directories (that is, directories that both have the same parent) can proceed in parallel, as can the listing of directories that are unrelated to each other (that is, one directory is not an ancestor of the other). The “DirScan” class shown below performs parallel scanning:
public class DirScanner {
private final ConcurrentMap<File, Boolean> files = new ConcurrentHashMap<File, Boolean>();
private final ExecutorService executor;
private final Semaphore sem = new Semaphore(1);
private final AtomicInteger count = new AtomicInteger();
public DirScanner(int threads) {
super();
executor = Executors.newFixedThreadPool(threads);
}
public Set<File> scan(File dir) throws InterruptedException,
ExecutionException {
sem.acquire();
scan0(dir);
sem.acquire();
return files.keySet();
}
private void scan0(File dir) {
for (final File file : dir.listFiles()) {
files.putIfAbsent(file, Boolean.TRUE);
if (file.isDirectory()) {
count.incrementAndGet();
executor.submit(new Runnable() {
public void run() {
DirScanner.this.scan0(file);
}
});
}
}
if (count.decrementAndGet() < 0) {
sem.release();
}
}
public void close(){
executor.shutdown();
}
public static Set<File> listAllContentsUnder(File dir, int threads)
throws InterruptedException, ExecutionException {
DirScanner scanner = new DirScanner(threads);
return scanner.scan(dir);
}
}
In common with the “SerialDirScanner” class, the “DirScanner” class has a public method named “scan” that takes a directory as an argument and returns a Set of File objects. The “scan” method acquires a semaphore of count 1, calls the “scan0” method, and waits to re-acquire the same semaphore. The “scan0” method performs the actual work of scanning directory contents. The “DirScanner” class contains an ExecutorService object named “executor” that is used to schedule scanning tasks.
Note that the “DirScanner” class accepts an integer constructor parameter named “threads”. This parameter is used to initialize the ExecutorService with a fixed thread pool of this size.
The “scan0” method submits a new task to the executor whenever it finds a sub-directory while performing a directory listing. The new “task”, in this case, happens to be a recursive invocation of itself. The “scan0” method increments a counter (an AtomicInteger instance object named “counter”) before the task is actually submitted. Each time the “scan0” method completes listing a directory, it decrements the counter, releasing the semaphore when the counter’s value falls below zero. This allows the “scan” method, which has been waiting to acquire this semaphore, to “wake up” and continue, effectively returning a Set of File objects from the key set of the concurrent map “files”.
Performance comparison and thread counts
To test the performance improvement, if any, of the parallelized directory scanner vis-a-vis the serial scanner, I wrote the following “main” method:
public static void main(String[] args) {
int status = 0;
try {
Set<File> files0 = null;
Set<File> files1 = null;
long t0 = System.nanoTime();
files0 = SerialDirScanner
.listAllContentsUnder(new File(args[0]));
t0 = System.nanoTime() - t0;
System.out.println(t0 + " ticks.");
t0 = System.nanoTime();
files1 = DirScanner.listAllContentsUnder(new File(args[0]), 5);
t0 = System.nanoTime() - t0;
System.out.println(t0 + " ticks.");
System.out.println(files0.size() + "," + files1.size());
System.out.println(files0.size() == files1.size());
} catch (Throwable exc) {
status = 1;
exc.printStackTrace();
} finally {
System.exit(status);
}
}
The argument I pass in to the main method is a directory that is well populated and quite deep. Running this on my development laptop (with the specs Core 2 Duo 2.26 MHz, 4GB, running Windows 7 Evaluation build No. 7100), I get:
1. Initially, before the directory structure is cached by the operating system:
3666120093 ticks.
2375654123 ticks.
39592,39592
true
2. Subsequently, after the directory structure has been cached:
3551936784 ticks.
2340802880 ticks.
39592,39592
true
So, there is a performance gain of roughly 33% using the parallel scanner using five threads (note that the “listAllContentsUnder” method accepts a thread count as the second argument). Using three threads, I get:
3594733052 ticks.
2335042793 ticks.
39592,39592
true
which is not much different from the readings obtained from using five threads. Using nine threads, I get:
3563738170 ticks.
2368012499 ticks.
39592,39592
true
which shows that the performance has actually been reduced by increasing the number of threads. The explanation for this curious observation is that the concurrent map, into which all threads write data, faces increased contention as the number of threads goes up.
Another interesting observation is that the parallel scanner is actually slower at scanning shallow directories. Consider the following output while scanning an example:
2602454 ticks.
10136304 ticks.
17,17
true
Conclusion
Using the parallel directory scanner speeds up the scanning of deep (and well populated) directories by roughly 33%. Shallow directories are better scanned with the serial directory scanner.
Monday Apr 20, 2009
I improved URLClassLoaderX (see post) using the ideas presented in a later post about supporting custom URL’s) . The improved (and much shortened source code) is given here.
Here is a list of the improvements:
1. After removing the entire bunch of classes involved with managing path urls, there are just five classes left.
2. I renamed IURLManagementStrategy to ResourceManager. The implementation class is renamed to ResourceManagerImpl.
3. Two new classes, namely DirectoryEntryContent and JarEntryContent, to deal with directories and jar files,respectively, have been added.
Again, the updated source code may be downloaded here.
Introduction
In a prior post, I described an alternative to URLClassLoader (see that post) named URLClassLoaderX. While implementing URLClassLoaderX, I had to build in support for URL's with a custom protocol (“path”). After finishing that implementation, I was not completely happy with the way I had to build ad-hoc support for the “path” protocol. Since that post was not primarily concerned with building support for custom URL protocols, I did not spend too much time improving the code supporting this feature in that post. In the following post, I present a set of classes illustrating how to rapidly build support for generic URL's.
First of all, what exactly do I mean by “custom” or “generic” URL's? Taking a look at URL support ready built into Java, one finds that certain URL protocols (such as “file:” and “http:”) are supported out of the box. So, one can directly construct an URL for a file and an HTTP site as follows:
URL fileURL = new URL(“file:/C:/var/tmp”);
URL httpURL = new URL(“http://www.cnn.com”);
But try constructing an URL using “new URL(“mem:foo”)”, and you get a MalformedURLException. It turns out that some extra work is required to support such “custom” URL's (see the documentation for “java.net.URLStreamHandlerFactory” for more information). Basically, one has to create a class implementing the URLStreamHandlerFactory class, and create another class that extends the URLConnection class. In the code presented in this post, I illustrate a scheme that minimizes the number of classes that must be implemented to build this support.
There are two central classes in the codebase presented below, namely an interface named “com.subhajit.url.Content” and a class named “com.subhajit.url.GenericURLStreamHandlerFactory” that implements URLStreamHandlerFactory.
The Content interface, shown below, describes objects that are capable of accessing some data from somewhere:
public interface Content {
/**
* Returns an {@link InputStream} from which the content of this object can
* be read.
*
* @return {@link InputStream} to the content of this object.
* @throws IOException
*/
InputStream openStream() throws IOException;
/**
* Returns a reference to the content of this object that is of a type equal
* to the first match of the given {@link Class}s, or <tt>null</tt> if the
* content cannot be returned as any of the given {@link Class}s.
*
* @param klass
* @return
* @throws IOException
*/
Object getContent(Class<?>... klass) throws IOException;
/**
* Returns the {@link URL} representing this object.
*
* @return
* @throws MalformedURLException
*/
URL toURL() throws MalformedURLException;
/**
* Establishes a connection to the content of this object.
*
* @throws IOException
*/
void connect() throws IOException;
}
The Content interface has a default abstract implementation named “AbstractContent”, which defines some abstract methods that sub classes must implement, and some methods that subclasses may optionally override:
public abstract class AbstractContent implements Content {
protected abstract String getProtocol();
protected abstract String getUrlFile();
protected URL url;
protected String getHost() {
return null;
}
protected int getPort() {
return 0;
}
public void connect() throws IOException {
}
public final URL toURL() throws MalformedURLException {
if (url == null) {
url = new URL(getProtocol(), getHost(), getPort(),
getUrlFile(),
new GenericURLStreamHandlerFactory()
.createURLStreamHandler(
getProtocol()));
new URLMapperFactory().getURLMapper().putData(
url.toString(), this);
}
return url;
}
}
The GenericURLStreamHandlerFactory class, which is another piece of the puzzle, is shown below:
public class GenericURLStreamHandlerFactory implements URLStreamHandlerFactory {
public URLStreamHandler createURLStreamHandler(String protocol) {
return new URLStreamHandler() {
@Override
protected URLConnection openConnection(URL url) throws IOException {
final Content data = new URLMapperFactory().getURLMapper()
.getData(url);
return new URLConnection(url) {
@Override
public void connect() throws IOException {
data.connect();
}
@Override
public Object getContent() throws IOException {
return data.getContent();
}
@SuppressWarnings("unchecked")
@Override
public Object getContent(Class[] classes)
throws IOException {
return data.getContent(classes);
}
@Override
public InputStream getInputStream() throws IOException {
return data.openStream();
}
};
}
};
}
}
Note how the overridden methods of this class delegate invocations to the enclosed “data” object, and how the “data” object is obtained from the URL via an “URLMapperFactory” class.
The URLMapperFactory class creates URLMapper classes, which map URL's to Content objects. URLMapper is actually an interface:
public interface URLMapper {
Content getData(URL url);
public void putData(String urlStr, Content data);
}
The default implementation of this interface is GenericURLMapper, which is a glorified wrapper around a concurrent map:
class GenericURLMapper implements URLMapper {
private static final GenericURLMapper instance = new GenericURLMapper();
private final ConcurrentMap<String, Content> map = new ConcurrentHashMap<String, Content>();
public Content getData(URL url) {
return map.get(url.toString());
}
public void putData(String urlStr, Content data) {
map.put(urlStr, data);
}
public static URLMapper getInstance() {
return instance;
}
}
Application
So, given these classes, how do you go about implementing a custom URL? It turns out that all you have to do is implement the Content interface, as shown below. But first, here is an example of how you might create a custom URL with the "jarentry" protocol:
URL jarEntryUrl = new JarEntryContent( new File( "pathToJarFile" ),"/my/company/classes/Class1.class").toURL();
Note that custom URL's are obtained indirectly from their Content objects, not directly constructed.As an example of implementing the Content interface, let us assume that you wish to implement URL's with a “mem” protocol. These URL's access content stored in memory. The “MemoryContent” class shown below shows you how to do this:
public class MemoryContent extends AbstractContent implements Content {
private final byte[] bytes;
public MemoryContent(byte[] bytes) {
super();
this.bytes = bytes;
}
public Object getContent(Class<?>... klasses) throws IOException {
if (klasses.length == 0) {
return openStream();
}
for (Class<?> klass : klasses) {
if (klass == byte[].class) {
byte[] copy = new byte[bytes.length];
System.arraycopy(bytes, 0, copy, 0, bytes.length);
return copy;
} else if (klass == InputStream.class) {
return openStream();
}
}
return null;
}
public InputStream openStream() throws IOException {
return new ByteArrayInputStream(bytes);
}
@Override
protected String getProtocol() {
return "mem";
}
@Override
protected String getUrlFile() {
return this.toString();
}
}
As another example, the “JarEntryContent” class, shown below, provides access to the bytes of a named entry in a specified JAR file.
public class JarEntryContent extends AbstractContent implements Content {
private final File jarFile;
private final String entry;
private final List<byte[]> holder;
public JarEntryContent(File jarFile, String entry) {
super();
this.jarFile = jarFile;
this.entry = entry;
this.holder = new ArrayList<byte[]>();
}
private byte[] getBytes() throws ZipException, IOException {
if (holder.isEmpty()) {
ZipFile zipFile = null;
try {
zipFile = new ZipFile(jarFile);
InputStream in = zipFile.getInputStream(new ZipEntry(entry));
if (in == null) {
throw new IOException("Zip file contains no such entry - "
+ jarFile.getAbsolutePath() + "\t" + entry);
}
try {
holder.add(StreamUtils.readFully(in));
} finally {
in.close();
}
} finally {
if (zipFile != null) {
zipFile.close();
}
}
}
return holder.get(0);
}
public Object getContent(Class<?>... klasses) throws IOException {
if (klasses.length == 0) {
return getContent(byte[].class);
}
for (Class<?> klass : klasses) {
if (klass == byte[].class) {
byte[] copy = new byte[getBytes().length];
System.arraycopy(getBytes(), 0, copy, 0, copy.length);
return copy;
} else if (klass == InputStream.class) {
return new ByteArrayInputStream(getBytes());
}
}
return null;
}
public InputStream openStream() throws IOException {
return (InputStream) getContent(InputStream.class);
}
@Override
protected String getProtocol() {
return "jarentry";
}
@Override
protected String getUrlFile() {
return jarFile.getName() + "#" + entry;
}
}
Here are some code snippets showing how to use JarEntryContent:
URL jarEntryUrl = new JarEntryContent( new File( "pathToJarFile" ),"/my/company/classes/Class1.class").toURL();
Note that custom URL's are obtained indirectly from their Content objects, not directly constructed.
Source Code
Complete source code for these classes can be found here. To use, download the zip file into a temporary directory and unzip it. This should create the following files:
a) generic-url.jar, which contains a built version of these classes,
b) src.zip, which contains the source code, and
c) util-lite.jar, which is a small utility library that I use. I will be happy to post complete source code for util-lite.jar upon request.
Sunday Apr 19, 2009
Introduction
In this post, I present an alternative to java.net.URLCLassLoader that has some desirable properties that the stock URLClassLoader does not have. The class presented here, com.subhajit.urlclassloader.URLClassLoaderX, extends URLClassLoader, and can therefore be used as a replacement for URLClassLoader in your source code.
URLClassLoaderX has certain desirable traits that URLClassLoader does not have. URLClassLoader holds locks to its URL's during its lifetime, which makes it impossible to update these URL's on operating systems like Windows. (This drawback has been recognized, and is due to be fixed in JDK 7 via the introduction of a new “close” method). URLClassLoaderX, on the other hand, does not behave in this manner. It briefly inspects its URL's upon initialization, caching information about the elements present therein, during which time it does lock the URL's. Immediately following initialization, however, URLClassLoaderX releases locks on the URL's, so that these can be updated on the file system if desired.
The ability to update or delete URL's is important in applications where you generate source code in your applications, compile it, and then load the resulting classes. In this situation, you would typically save the source code in temporary files, and load the compiled byte code from another temporary directory. The trouble with URLClassLoader is that once you have loaded the temporary byte codes, you cannot get rid of the temporary directory from which the byte code was read as long as the URLClassLoader instance is alive (on operating systems like Windows which maintains mandatory locks on files). There might be other situations where you might need to delete temporary directories or files from which you have loaded classes, and once again, using URLClassLoader leads to “locked” files and directories.
As mentioned earlier, URLClassLoaderX does not lock files or directories from which it loads classes and resources for its entire lifetime. Instead, it creates an internal cache of the URL's and the paths of the resources they contain, during initialization. For example, if a directory URL contains two resources named “com/subhajit/test/A.class” and “com/subhajit/test/picture.gif”, URLClassLoaderX initializes its internal cache such that there is an entry for the URL containing a set of strings populated with “com/subhajit/test/A.class” and “com/subhajit/test/picture.gif”. When resources and classes must be loaded, it consults these entries to determine which URL's to load these from. A resource is loaded from the first URL (in the list of URL's used to initialize the URLClassLoaderX instance) in which it is found.
URLClassLoaderX uses a “strategy” instance to which it delegates most calls. Let us first look at the strategy class.
IURLManagementStrategy
The “strategy” instance must implement the “com.subhajit.urlclassloader.IURLManagementStrategy” interface:
public interface IURLManagementStrategy {
/**
* Finds the resource <tt>name</tt> by iterating through the list of
* {@link URL}s being managed, and returns a "path" URL encoding the URL in
* which the resource was found, and a path to the resource within that URL.
*
* @param name
* @return An encoded "path" {@link URL} to the resource if the resource is
* found in one of the {@link URL}s being managed, or null if the
* resource <tt>name</tt> is not found.
* @throws MalformedURLException
*/
URL findResource(String name) throws MalformedURLException;
/**
* Find the resource named <tt>name</tt> in all the {@link URL}s being
* managed.
*
* @param name
* @return
* @throws MalformedURLException
*/
Enumeration<URL> findResources(String name) throws MalformedURLException;
/**
* Resets this object so that it can be reused.
*/
void reset();
}
The source code provided in this post contains a default implementation of IURLManagerStrategy named (somewhat unimaginatively) DefaultURLManagementStrategy.
DefaultURLManagementStrategy contains a ConcurrentMap<URL,Set<String>> named “urlContentNames”, which is initialized when the object is constructed. The initialization code uses a java.util.concurrent.ExecutorService to concurrently open and parse the specified URL's.
The “findResource” method iterates through the URL's looking for the first entry in urlContentNames for which the corresponding Set<String> contains an entry for the resource name being sought. If such an entry is found, it constructs and returns a special URL called a “path” URL that points to the resource.
Path URL's are special URL's that have a protocol named “path”, and contain two pieces of information, viz. the URL in which the resource was found, and the location of the resource within that URL. An example of a path URL (in String form) is “path:file:/C:/var/clearcase/lite-all/tools-lite/url-class-loader-test/./test/temp/#com/subhajit/jgrep/Location.class”. Note the following characteristics of this URL:
-
It has a protocol named “path”.
-
The URL containing the resource may be constructed by stripping away the “path:” prefix, parsing the resulting string into tokens separated by “#”, and using the first resulting token (which, in this case, is “file:/C:/var/clearcase/lite-all/tools-lite/url-class-loader-test/./test/temp/”).
-
The location of the resource is obtained by parsing the string into tokens separated by “#” and using the second token.
As with all custom URL protocols, the “path” protocol requires a set of helper classes in order to be used. These classes are present in the source code provided in the “com.subhajit.tools.urlclassloader” package (see “PathURLConnection”, “PathURLStreamHandler” and “PathURLStreamHandlerFactory”). As a refinement, the findResource method caches URL's generated for resources found in previous calls, for the benefit of subsequent calls to find these same resources.
Note that findResource does not actually open, load or cache resources: it just creates (path) URL's that may be used to refer to these resources.
URLClassLoaderX
Having described the strategy class, let us move on to the URLClassLoaderX class itself. This class implements two types of methods, viz. Those that are relevant to class and resource loading, and other methods such as “equals”, “hasCode”, etc.
URLClassLoaderX implements the following methods relating to class and resource loading:
-
findResource
-
findResources
Both methods directly delegate the call to the strategy object as shown below:
@Override
public URL findResource(String name) {
try {
return strategy.findResource(name);
} catch (MalformedURLException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
@Override
public Enumeration<URL> findResources(String name) throws IOException {
return strategy.findResources(name);
}
Remaining work
At this time, URLClassLoaderX is functionally complete. The next step is to improve its performance. Some ideas for enhancement that come to find are better URL management strategies (perhaps those that lazily defer work as much as possible), and a way to allow users to specify different types of URL's (and their handler classes) in an open and flexible manner (currently, only “file” URL's are supported).
Using the source code
The source code is provided as a “zip” file named “url-class-loader.zip”. To use it, download it to a temporary directory and unzip it. This results in three files, viz. “src.zip” (which contains the source code), “url-class-loader.jar” (which contains a pre-built version of the source code, compiled for Java 1.5) and “util-lite.jar”, which is a lightweight utility library that I have written. I will be happy to post complete source code for the “util-lite.jar” file if requested. The source code is available here.
Saturday Feb 21, 2009
Correction: The files you need to download to try out the profiler are:
The main body of the article contains incorrect links to these files. Sorry for the confusion caused.
In this post, I present a Java agent that calls back a custom listener class (which you provide) whenever methods in a set of classes (which you specify via configuration) are invoked. The only requirement on the listener class is that it contain a public static method with a specified signature:
public static void someMethod(int operation, String methodSignature, Object targetObject, Object…args), where:
| operation | “0” (start of method execution), “1” (normal end of method execution) or “2” (end of method execution with an uncaught exception) |
| methodSignature | signature of the invoked method |
| targetObject | object on which the method was invoked. |
| args | variable number of arguments indicating the arguments the method was invoked with. |
It is trivially simple writing listener classes. Here, in its entirety, is a “logging” listener:
public class SimpleListener{
public static void callme(int op, String signature, Object obj,
Object... args) {
if ( op == 0 ){
System.out.println( signature );
}
}
}
Later, I provide a more involved (and useful) listener class in the code examples that allows the profiling of method executions.
What I really like about this project is that the agent is implemented using AspectJ, but AspectJ is fully embedded (“hidden” from view) from the user’s viewpoint. In other words, the user does not provide any resources, additional JAR’s or configuration that AspectJ requires, nor even “needs to know” that AspectJ is being used under the covers. Look at the “NotifierMain” and “PrepareAjWeaverInputs” and “Transformer” classes for details on how I did this. I do not go into this further in this post since this is irrelevant to the ideas I wish to discuss here.
Using the agent
To use the agent, you need the file “ajweaver-agent.jar” which can be downloaded from here. Next, you must provide some command line arguments to the command that launches your application, to wit “-javaagent:{absolute path to the ajweaver-agent.jar file}. Finally, you must provide a set of system properties that tell the agent what classes must have their method invocations reported, what your custom listener class is called, and what the name of the method is (in that class) that must be called back by the agent. For the example listener shown above (SimpleListener), these system properties would be defined as:
| instrument-packages | com.justthispackage.*,com.thispackageandallsubpackages..*,com.yet.another.package.* (Note that the values are comma separated). |
| callback-class | SimpleListener |
| callback-method | callme |
The command line looks like this:
| java …… –javaagent:${home}/ajweaver-agent.jar –Dinstrument-packages=com.sun.example.*,com.sun.allexamples..* –Dcallback-class=SimpleListener –Dcallback-method=callme …… |
With these instructions out of way, let us jump right in and try out the agent (and a cool profiler listener class).
Profiling method calls
To get started with the agent, let us look at a custom listener class (com.subhajit.ajprofiler.Profiler) that provides basic profiling of method invocations by tracking the number of times individual methods are called, and the minimum, average and maximum execution time of the invocations. These classes can be configured to optionally setup JMX functionality from which the profile information can be extracted remotely.
As with all agent callback classes, the Profiler class has a public, static method with the signature: public static void {methodName}(int,String,Object,Object..). In Profiler, the method is named “profilerCallback”:
|
public static void profilerCallback(int op, String methodSignature, case 1: default: |
The profilerCallback method treats the two cases when the first parameter is “0” (execution start) and “1” or “2” (execution ended normally or execution ended with uncaught exception) separately. The first case is executed as method bodies are entered, and the other case is executed when method bodies are exited. These cases invoke the “begin” and “end” methods of a default instance of Profiler, respectively.
The “begin” method creates a “CflowKey” object to represent the program flow at this point using the target object (whose instance method body has started executing), the current thread, the method signature and the current timestamp. This object is next appended to a list of such objects dedicated to this method instance. This effectively records the start of execution of this method’s body in the program flow:
|
public void begin(Object obj, String methodSignature, long timestamp) { |
The “end” method creates a new CflowKey instance using the target object, the current thread and the method signature, and then iterates through the list of stored CflowKey objects for this method signature looking for the last element that matches this CflowKey (with respect to the target object, method signature and current thread) and for which the stored timestamp is less than the timestamp of this CflowKey object. What we are effectively doing in this step is finding the last recorded “start of execution” of this method in this program flow. Given that one object can be executing only one method in the same thread at any time, the last time we recorded the start of this method for the same object and the same thread must correspond to the start of this selfsame flow of execution. Having found this CflowKey item, we record its timestamp, and then remove it from the list of stored CflowKey items.
|
public void endHelper(Object obj, String methodSignature, long timestamp) { if (executionTimeHasBeenSet) { |
The second part of this method (starting with the line “if (executionTimeHasBeenSet)”), updates a “MethodExecutionInfo” object with the time taken by this execution of the method. The “update” method of the MethodExecutionInfo class updates its internal state using the execution time:
|
public void update(long executionTime) { |
Trying it out
To try out this agent and the profiler, download the two files ajweaver-agent.jar and ajweaver-agent-profiler.jar (see the top of this article under "Correction" for the download links) and place them in a temporary directory (say “C:\tmp”). Next, modify the launch script for your application by introducing the following terms (prior to the specification of the class to be executed):
java….-Djavaagent:c:\tmp\ajweaver-agent.jar –Dinstrument-packages={list of packages} –Dcallback-class=com.subhajit.ajprofiler.Profiler –Dcallback-method=profilerCallback …
If this is a long running application (such as a server), also add the term: –Dprofiler-port={a port number}, and you will be able to access profiling information via JConsole. For example, if you entered port “1234”, start JConsole, choose “Remote Process”, enter “localhost:1234” and click on the “Connect” button. Once connected, go to the “MBeans” tab, locate the tree node for “com.subhajit.ajprofiler” and open it out fully. Selecting “Operations”, followed by clicking on the “getData” button, fetches a Map of accumulated profile information from the Profiler class.
To try this with Apache tomcat, make the following changes:
1. Make a copy of the “bin\catalina.bat” file and call it “bin\cataprofiler.bat”.
2. Add the following lines to “bin\cataprofiler.bat”:
set JAVA_OPTS=-Dinstrument-packages=org.apache.catalina..*,org.apache.coyote..*,org.apache.naming..*,org.apache.tomcat..* -javaagent:C:\var\tmp\ajweaver-agent.jar -Dcallback-method=profilerCallback -Dcallback-class=com.subhajit.ajprofiler.Profiler -Xms1024m -Xmx1256m -Dprofiler-port=1234
set CLASSPATH=C:\tmp\ajweaver-agent-profiler\dist\ajweaver-agent-profiler.jar;%CLASSPATH%
changing the package names in the first line to whatever packages you are interested in.
3. In the “bin\setclasspath.bat” file, locate the following stanza and modify it as shown:
rem Set standard CLASSPATH
rem Note that there are no quotes as we do not want to introduce random
rem quotes into the CLASSPATH
if not exist "%JAVA_HOME%\lib\tools.jar" goto noJavac
set CLASSPATH=%JAVA_HOME%\lib\tools.jar;C:\tmp\ajweaver-agent-profiler\dist\ajweaver-agent-profiler.jar
4. Start tomcat using the command: “bin\cataprofiler.bat run”.
5. After tomcat starts up, start JConsole using the command: “jconsole”, which assumes that you have the JDK’s “bin” directory in your path. You should see:
Select “Remote Process”, enter “localhost:1234” as shown and click on the “Connect” button. This takes you to:
Select the “MBeans” tab:
Clicking on the “getData” button pops up a dialog showing accumulated profile data:
To use a custom GUI program, download ajweaver-agent-profiler-gui.jar. To use it, run “java –jar ajweaver-agent-profiler-gui.jar”. This brings up:
Click the “Refresh” button to bring up a dialog:
Enter the hostname and port where the profiler is available. If the profiled application is running on the same host as the GUI, you could enter “localhost” for the host. For the port, enter the value of the “profiler-port” system property that was specified when the application was started. Clicking the “Ok” button brings up:
Clicking the “Refresh” button refreshes the display.
Other applications
Some applications of this facility that immediately come to mind are method tracing, method logging, heap tracing (dynamically tracing the amount of heap space used by an application as it runs), application health monitoring (trends of method invocations resulting in uncaught exceptions vs. “normal” execution), etc.. At development time, this framework can provide lightweight profiling capabilities.
Saturday Feb 14, 2009
Introduction
I recently completed an experimental project implementing “bound beans”, or a collection of beans which are wired up at run time so that some attribute changes on some of these beans result in other attribute changes on the same or a subset of the other beans. So, assume that had two instances of a hypothetical “Point” class with a couple of integer attributes named “x” and “y”, called “p0” and “p1”. Next, assume that these beans were bound to each other in such a way that setting some value to “p0.x” (by invoking “p0.setX(…)”) resulted in the same value being set into “p1.y”. Symbolically, you could express this binding relationship as “p0.x—>p1.y”. Now, you could throw an instance of a “Line” class into the mix (named “line”) , where the “Line” class contains two “Point” attributes named “start” and “end” (to represent the start and end points of a line, say). Further, you could specify an additional binding rule “p1.y—>line.start.x”. Then, setting “p0.x” results in the new value of this attribute cascading first to “p1.y”, and then to “line.start.x”.
The question is: what could you do with such a system?
Before tackling this question, however, let us look at an approach to implementing bean binding. The inputs to the the binding process are:
1. The bean instances which must be bound.
2. The binding relations (which I call “propagation rules”). The “source” of a propagation rule is the bean instance and its attribute that cause changes to occur in the same or other beans. For example, the “source” of the propagation rule “p0.x—>p1.y” is “p0.x”.
Pilot Implementation
My pilot implementation utilizes a Java agent to instrument some bean classes at load time in such a way that their “setXXX” methods are enhanced. The enhancement consists of invoking a static method call on a named “listener” class, passing the object (on which the “setXXX” method has been called) as a parameter. So, if the “setX” method of the “Point” class was originally:
public void setX(int x){
this.x = x;
}
its enhanced form is:
public void setX(int x){
this.x = x; // This is from the original version
com.subhajit.boundbeans.impl.Listener.attributeChanged(this);
}
The “Listener” class’s “attributeChanged” method knows which bean it is that has just been affected (from its argument, which is the “this” of the bean passed in via the call to “attributeChanged”). But how does it know which attribute of the bean has been affected? It simply creates a dummy RuntimeException, fills in its stack trace, and starts walking the stack trace elements down one by one until it reaches an element that was contributed by the class of the bean. From this element, it extracts the name of the method that was last called on the class of the bean (which, for a “Point” instance is going to be “setXXX”), from which, ultimately extracts the name of the bean attribute that has just been set).
Having determined both the bean instance that was affected and the particular attribute in the affected bean, the Listener class examines the propagation rules, looking for all rules in which the “source” (or the action that causes the rule to fire) is a change of this attribute on this bean. For all such rules, it determines the bean attributes (of the same or other) beans that are bound to the change in this attribute for this bean, and commences to set those attributes one by one. Of course, setting these attributes for the affected beans potentially triggers more updates (if target attribute being set on a target bean happens to be the “source” of another propagation rule).
Needless to say, I provide a facility to check for “cycles” that might intentionally or inadvertently arise in a set of propagation rules. The presence of cycles can potentially lead to “endless” property propagation resulting in an eventual StackOverflowError. For example, if the propagation rules specified are “p0.x—>p1.y” and “p1.y—>p0.x”, then changes to “p0.x” are propagated to “p1.y”, which are, in turn, propagated back to “p0.x”, to start the propagation cycle once again.
Applicability
To be honest, I cannot find myself coming up with a meaningful way to apply this “bean binding” facility. I can think of MVC GUI’s where setting a bean attribute in a model might propagate the value to a view (and vice versa) as one example of use. Can you help me with others?
Friday Feb 13, 2009
This is a test post from Windows Live Writer. Ok, just checked and the test worked. So, let me describe exactly what I had to do to get this working.
First, obviously, download and install Windows Live Writer. Next, fire it up and start configuring it. As for myself, positing to my blogs at Sun required that I first go to my blogs and change my password from its (then) current setting to a new one. (The “default” password that works for Sun employees is apparently their LDAP password. It is a good idea to choose a different password to access your blogs, since authoring tools might choose not to secure it when they pass it along to the server). Next, I went to the “Preferences” page (in my blog), changed to the “Maintenance” sub-tab, and cleared my page load cache.
Continue filling up the entries in the rest of the configuration dialog. That is it! Enjoy!
Friday Jan 30, 2009
As a follow up to my recent blog posting on long running tasks, here is an example of using the mini-framework described in that post to copy files from one directory to another:
The code above refers to two methods of a "FileUtils" class. The "listAllFilesUnder" method, which returns a List<File> containing all files (not directories) under a given directory and its sub-directories, is shown below:

This method, in turn, refers to a method named "listAllContentsUnder" in FileUtils:

The "FileUtils.copyFile" method, which uses Java NIO, is shown below:
And here is the (obligatory) screenshot showing the progress bar in action:
Enjoy!
I recently dusted off some code I had had written a while back to serve as a mini-framework (Swing based) for long background tasks that updated progress bars as they executed. The code is compact and easy to to use, but lacked a facility to allow the user to cancel the task. I updated the code to provide this facility today, and wanted to share it for review, comment and possible re-use.
The primary design criterion that drove the design of this mini-framework is ease of use. I feel that the end result is indeed easy to use, requiring, in most cases, a one line invocation and an anonymous implementation of an interface (see below).
To set the stage for this discussion, let us review some common characteristics of long running background tasks.
As the name suggests, long running background tasks are "long running": they are slow enough to seriously impact "responsiveness" in applications providing user interactivity. These tasks usually involve some heavy duty local or remote processing, and never involve direct user interaction. Examples of such tasks are searching all files under a given sub directory for occurences of a textual token (which is what "grep -R" does), bulk loading data into a database, or performing an intensive mathematical calculation.
Often, long running background tasks involve a number of iterations performing a simpler task, and the iteration count is often known in advance. For example, the number of files to search, or the number of records to insert, or the number of times a square root must be calculated are often known at the outset of the task. Sometimes, however, an iteration count is not known in advance, perhaps due to the nature of the problem, or perhaps due to the nature of the algorithm (which might be recursive, not iterative).
Having described the type of problem we are trying to solve, here is a quick look at an application that performs a background task (counting up from 0 to 100 in a "for" loop, updating a progress bar as it works):

The calling thread is blocked while the "exec" method runs. A progress bar is shown to the user to indicate progress. The long running task calls "progress.increment()" regularly to update this progress bar. Another thing that the long running task does is check the "interrupted" status of its thread: if the interrupted status indicates that the thread has been interrupted, the long running task interprets this to mean that the user has cancelled the task, and it returns immediately.
In this example, we have a SecureRandom object named "random", and we use it repeatedly to generate a random "sleep time" during each iteration. This is done to simulate the "long running"-ness of this task. Running this in a "main" program gives:

All that the user must do is implement the "ISwingRunnableWithProgress" interface, which has just one method:

The classes used to implement the mini-framework are provided in the supplied source code (see below). The TaskRunner class is the lynchpin of this min-framework. Its "exec" method (see the code sample above) launches the background task and positions the progress bar dialog in the center of the screen.

The TaskRunner class implements the ICancellable interface:

The TaskRunner class adds the current instance as a listener for cancel events to the progress dialog in its constructor:

and it interrupts the background thread running the long running task if its "cancelled" method is invoked (by the progress dialog, if the user presses the Cancel button):

To try out this mini-framework, download gui-progress.jar to a temporary directory and run "java -jar gui-progress.jar". The source code for the mini-framework is available in gui-progress-src.zip. This file contains source code for a number of other, unrelated classes as well, which you are free to ignore.
Saturday Jan 24, 2009
A question about servlets occurred to me just now. As we know them in general use, servlets are HTTP servlets. Put another way, the javax.servlet.Servlet interface has been traditionally used via its javax.servlet.http.HttpServlet implementation. The question that arises is: what about an implementation of java.servlet.Servlet that uses a raw, socket based TCP implementation instead?
Before going into the above question, it is perhaps to pause and ponder possible uses, if any, of such as TCP implementation. After all, in what way would such an implementatoin be useful? More specifically, what would be some compelling benefits of a TCP implementation compared to the traditional HTTP implementation?
Please share your thoughts...
Thursday Jan 22, 2009
Introduction
The serializer is a Java agent that instruments classes as the JVM loads them by marking some of them as serializable. Marking a class serializable involves two steps:
- Adding the "java.io.Serializable" interface to the list of interfaces the class implements.
- Adding a private static final long serialVersionUID variable to the class, and initializing it appropriately.
The classes to be thus marked are specified via a set of system properties. At class load time, the serializer performs the following steps:
- If the class belongs to a handful of "untouchable" packages (such as those with names starting with "java.", "javax.", "com.sun.", "sun.misc.", etc.) it is not instrumented.
- If the class belongs to a set of packages that the user wants instrumented (as specified by the "makeser" system property), and also to a list of classes or sub-packages that the user does not wish to instrument (specified by the "ignoreser" system property), it is not instrumented.
- If the class does not belong to the set of packages that the user wants instrumented, it is not instrumented.
- If the class is an interface, it is not instrumented.
- If the class is abstract, it is not instrumented.
- If all the above checks "pass", the class is instrumented.
Defining the agent
Several things must be done to setup a Java agent (see references). Specifically, an entrypoint class must be created and declared in a manifest file entry (see com.subhajit.serializer.SerializerMain and the META-INF/MANIFEST.MF file provided in the source code). The SerializerMain class:

contains a method named "premain" which acts as the entry point into the agent. In our case, the "premain" method adds a "ClassFileTransformer" instance of the "SerializerTransformer" class (included in the provided source code). The SerializerTransformer class is where the meat of the instrumentation occurs.
The SerializerTransformer class uses two system properties, named "makeser" and "ignoreser" to read comma separated lists of packages which must be, and which must be instrumented, respectively. An example value for the "makeser" system property is "com.mycompany.project1.impl,com.mycompany.project1". The "ignoreser" system property must be declared similarly. If either of these system properties is missing, the SerializerTransformer class treats them as if they were empty strings. (This can have strange effects. If, for example, the "makeser" system property is not defined, no class is instrumented).
SerializerTransformer transformer first checks the class name (against untouchable packages and those that have been declared via the "makeser" and "ignoreser" system properties). It next checks the class bytes (by creating a (BCEL) JavaClass object and getting information about the class via this object). Finally, it instruments class bytes for those classes that it determines should be instrumented. The actual code that performs the instrumentation is surprisingly simple:

Using the serializer
To use the supplied source code, you must first build it using the supplied (ant) build script. The output produced by the build are two files named "dist/serializer.jar" and "dist/serializer.zip". Copy the zip file to a location where you want your applications to find it (eg. "/opt/serializer/" or "c:\ser\serializer\"), and unzip it therein to create three files, namely "serializer.jar", "becl-5.2.jar" and "src.zip". Include the "bcel-5.2.jar" in your application's class path.
Next, modify the launch script of your application in the following manner:
- Add the JVM option: -javaagent:{locationOfSerializerJars}/serializer.jar
- Add the "makeser" system property by setting its value to a comma-separated list of package names that must be instrumented (eg. -Dmakeser=com.mycompany.myproject1,com.mycompany.commonlib).
- Add the "ignoreser" system property by setting its value to a comma separated list of package and class names that must not be instrumentd.
That is it. When your application start up, you would see a number of messages from the serializer (in your standard output) indicating classes it has decided to instrument.
Source code and references
You can download the source code for the serializer here.
References for Java agents:
- 1. http://javahowto.blogspot.com/2006/07/javaagent-option.html
- 2.http://java.sun.com/javase/6/docs/api/java/lang/instrument/Instrumentation.html
Friday Jan 02, 2009
Introduction
In this article, I describe a simple Java expression evaluator thaty evaluates algebraic expressions including a limited number of trigonometric, logarithmic, transcendental and exponential functions. The expression evaluator described here functions by generating and caching custom Java expression evaluation classes per expression. The custom classes implement the following interface:

Expression evaluation classes are generated by a generator class (MathEvalGenerator). The "generate" method of this class accepts a String expression representing the expression to be evaluated, and returns an IMathEval object that evaluates the expression:
The "generate" method has to perform several steps in order to create an IMathEval object for the expression.
Generating evaluators for expressions
The "generate" method of MathEvalGenerator seeks to generate the source code of a class that implements IMathEval, compiling the generated source code, loading the bytes of the resulting CLASS file into an anonymous class loader, and, finally, returning an instance of this class. Let us walk through these steps using an example expression: "sin(x)+cos(x)".
The generator class caches the names of all public methods of the "java.lang.Math" class during (static) initialization. Given an expression (sin(x)+cos(x) in our example), the generater class first replaces all occurences of method names in its cache of appearing in the expression, with the expression "Math."+methodname. For our sample expression (six(x)+cos(x)), the resulting expression becomes "Math.sin(x)+Math.cos(x)".
Next, for performance reasons, the generater class checks an internal cache of classes it has already generated to check if the class for this expression is present. If so, meaning that the generater class has already generated the class for this expression before, it directly instantiates the cached class and returns the evaluation object without further ado.
If the generater class does not find a cahced evaluation class for this expression, it proceeds to generating the source code of the evaluation class. For our example, the source code of the evaluation class looks like this:

Here, note that the class name "Eval0" is automatically generated to ensure uniqueness. Also, note that invoking "new Eval0().eval(0.25,0.5)" returns the value of the expression sin(0.25) + cos(0.5).
Having generated the source code, the generater class compiles it using the new compiler API provided by Java 1.6. The following steps are used to accomplish this:
- Create a temporary directory under the directory returned by "System.getProperty("java.io.tmpdir")".
- Save the generated source code to this directory.
- Export the source code of the IMathEval interface to this directory.
- Invoke the compiler on these two source files.
- Read the contents of the generated CLASS file (for the implementation class).
- Load these into an anonymous class loader.
- Instantiate the class and return the instance.
Critique
While the technique shown here to evaluate expressions is simple, it is probably "too simple". For example, it sometimes requires an "ugly" expression syntax: (pow(sin(x),2)+pow(cos(x),2)) to evaluate "sin^2(x) + cos^2(x)". For another, it is limited to supporting only those expressions that the java.lang.Math class supports out of the box.
It is possible to improve the generator class by allowing it to save (and reload) generated classes between process lifetimes. Thus, the generater class could be extended so that it stored all generated class bytes in a repository (such as a ZIP file), and reloaded these during initialization.
Complete source code for this project is available here.
Introduction
Object serialization and de-serialization is pretty straightforward in Java. One serialiazes objects by creating an ObjectOutputStream and then writing objects to it. One deserializes objects by opening an ObjectInputStream to read from the serialized bytes of the object, followed by reading objects from the ObjectInputStream. All this is well and good, until one encounters the problem, during deserialization, that the objects to be desrialized belong to classes that are known only to a custom class loader. In this case, attempting to deserialize objects results in ClassNotFoundExceptions.
This article shows a technique that allows desrialization of objects belonging to classes defined in custom class loaders. But first, let us take a look at the problem.
The problem
Let us assume that we create a "Point" class to model two dimensional points. The "Point" class is a java bean class containing two "int" fields named "x" and "y" (representing the "x" and "Y" co-ordinates of a two dimensional point, respectively).
We create the 'Point" class dynamically using the technique describe in an earlier blog positing about dynamic java beans. Next, we instantiate a couple of Point objects:
Next, we serialize the point objects p1 and p2 to a byte array:

Having obtained the byte array bytes, we can demonstrate the problem we are trying to solve by attempting to deserialize the objects contained therein using the following code snippet:

To our chagrin, this throws a ClassNotFoundException trying to read the first object. It should be pretty obvious why this exception is thrown: essentially, the objects being read belong to a class ("Point") about which the application class loader knows nothing. This is so because the "Point" class has been defined within the context of a custom class loader.
The solution
What we need during the deserialization process is a way to tell the ObjectInputStream to use a custom class loader which "knows" about the "Point" class. In other words, if we had a class loader with us at the point we invoked the deserialization code above that was able to "find" the "Point" class, we need some way to tell the ObjectInputStream to use this class loader as it attempted to read the objects from the stream. Looking at the deserialization API, however, there does not appear to be an obvious way to do this.
This article uses the following technique to accomplish the solution:
Create a CustomObjectInputStream by extending ObjectInputStream. CustomObjectInputStream accepts an array of custom ClassLoaders to use during deserialization.
Override the "resolveClass" method of ObjectInputStream in CustomObjectInputStream.
The "resolveClass" method accepts an ObjectStreamClass object and returns a Class object corresponding to this object (see the Java doc of ObjectInputStream). CustomObjectStream performs attempts to first use the application class loader top load the class corresponding to the name of the ObjectStreamClass. If this fails, it attempts to use each of its custom class loaders one by one to load the class. If even this fails, the over ridden resolveClass throws a ClassNotFoundException:

Putting it all together
The following code snippet puts all of the above together. Note that we define the functionality of the CustomClassLoader anonymously in the code snippet below:

Source code for this article is provided in the form a JUnit test. It is available for download here.
This blog copyright 2009 by Subhajit Dasgupta