Friday Sep 18, 2009
Monday Sep 14, 2009
Ever written a test-suite and wanted an ad-hoc app-server deployment (such as tomcat) that you could create (and use) on the fly, only to get rid of it once the test suite completed? Ever written applications and wanted to create an embedded database instance (such as derby) or message-queue (such as your private ActiveMQ instance) for internal use?
See rest of this blog entry here.
Saturday Sep 12, 2009
I have been thinking about an interesting problem for some time: how to generate code-distributions that are both self-contained and minimal. By "code-distribution" I mean a Java application distributed in the form of one or more JAR files containing classes and resources, of which one class is a designated “main class” (that is, it contains a “public static void main(String[])” method). By "self-contained" I mean that the code-distribution contains all the classes and resources needed by the main class to run. By "minimal" I mean that the code-distribution contains only classes that are used, directly or indirectly, by the main class.You can read the complete posting here.
Sunday May 17, 2009
Introduction
In this post, I present a fast Java "grep" application. The application presented here is not a full featured replacement for "grep": rather, it presents a stripped down "grep" application built on top of parts you can extend and configure to implement additional features.
What does the grep application presented here do? It allows you to specify a base directory (containing files and sub-directories) under which to search, a "file name" term indicating the files (lying under this base directory) to search, and a search term. Thus for example, if you wanted to search for all JAVA files lying in the base directory "c:\var\projects" containing the literal "class", you would invoke this grep as follows:
java -jar jgrep.jar -d c:\var\projects -f .java class
Note that the "-f" argument specifies a literal string with which a file name must end in order for it to be included in the search.
Functional elements
If "grepping" may be defined as the act of building a set of files located under a given directory (and sub-directories thereof), for files conforming to a given naming pattern and containing a given search term, then grepping consists of the following independent functional elements:
1. Locating all files and sub-directories under a directory.
2. Filtering out directories and files that conform to a the given file name pattern from the above list, and reading the contents of those files.
3. Searching for the search term within the files obtained in the previous step.
4. Presenting files found in the previous step to the user.
Diagrammatically, we can represent the above steps as follows:
Thinking about the above decomposition, it is apparent that these operations may overlap. In other words, a certain step does not have to wait to begin for its previous steps to complete: each step may be seen as a "producer" that produces work for its successor thread, and, at the same time, a "consumer" for work produced by its predecessor. So, it is possible to redraw the diagram as follows:
Building on the "producer consumer" terminology introduced in the previous paragraph, we can flesh out this diagram with three queues:
In the diagram shown above, the little trapeziods represent queues, with the base (the longer parallel side) representing the producer and the top representing the consumer. Thus, the left most trapeziod represents a queue in which items are produced by the directory scanner and consumed by the file reader. Similarly, the second trapezoid represents a queue in which items are produced by the file reader and consumed by the content finder. The right most trapezoid, similarly, represents a queue in which items are produced by the content finder and consumed by the presentation component.
Code
The code presented below consists of four classes:
1. DirectoryScannerProducer scans a directory using the "fast directory scanning" technique presented in a previous posting, filling a blocking queue with directory and file entries.
2. The IProcessor class is a generic class that declares a generic "process" method that accepts an input of a given generic type, and returning a value of another given generic type. Instances of this class must be "stateless" (that is, they must not store state in instance variables).
3. QueueProcessor is a generic class that consumes items from a blocking queue containing items of a given type, processing these using a pre-defined IProcessor, and filling another blocking queue with the result. The QueueProcessor fills only non-null values returned by the IProcessor into the output queue. Any exceptions thrown by the "process" method of the IProcessor are ignored. QueueProcessor has a "waitFor" method that indicates that the QueueProcessor must stop after the given number of inputs have been processed.
4. Grep provides a main method that declares and sets up all the components and the queues, parses command line arguments, and launches all the components.
IProcessor
package com.subhajit.util.grep;
import java.lang.reflect.InvocationTargetException;
public interface IProcessor<I,O> {
O process(I input) throws InterruptedException, InvocationTargetException;
}
DirectoryScannerProducer
package com.subhajit.util.grep;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.Semaphore;
import java.util.concurrent.atomic.AtomicInteger;
/**
* Scans a directory and puts all files (not directories) into a given
* {@link BlockingQueue}.
*
* @author sdasgupta
*
*/
public class DirectoryScannerProducer {
private final Semaphore sem = new Semaphore(1);
private final AtomicInteger waitingCount = new AtomicInteger(0);
private final List<ThreadWithList> threads = new ArrayList<ThreadWithList>();
private final int installmentSize;
private final AtomicInteger producedCount = new AtomicInteger(0);
private static class DirInfo {
private final File dir;
private final File[] listing;
private int index;
public DirInfo(File dir) {
super();
this.dir = dir;
this.listing = dir.listFiles();
this.index = 0;
}
public int getIndex() {
return index;
}
public void setIndex(int index) {
this.index = index;
}
public File getDir() {
return dir;
}
public File[] getListing() {
return listing;
}
}
private final BlockingQueue<DirInfo> workingQueue;
private final BlockingQueue<File> producedQueue;
private final class ThreadWithList extends Thread {
private int useCount = 0;
public ThreadWithList() {
super();
}
public void incrementUseCount() {
useCount++;
}
public int getUseCount() {
return useCount;
}
public void run() {
while (true) {
if (DirectoryScannerProducer.this.scan0()) {
break;
}
}
}
}
public DirectoryScannerProducer(BlockingQueue<File> producedQueue,
int threads, int installmentSize) {
super();
this.producedQueue = producedQueue;
this.installmentSize = installmentSize;
workingQueue = new LinkedBlockingQueue<DirInfo>();
for (int i = 0; i < threads; i++) {
ThreadWithList t = new ThreadWithList();
this.threads.add(t);
}
}
public int scan(final File dir)
throws InterruptedException, ExecutionException {
sem.acquire();
workingQueue.add(new DirInfo(dir));
for (ThreadWithList t : threads) {
t.start();
}
sem.acquire();
return producedCount.get();
}
private boolean scan0() {
waitingCount.incrementAndGet();
// Remove the next item from the queue.
ThreadWithList thread = ((ThreadWithList) Thread.currentThread());
DirInfo dirInfo = null;
try {
if (waitingCount.get() == threads.size() && workingQueue.isEmpty()) {
sem.release();
return true;
}
dirInfo = workingQueue.take();
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return true;
} finally {
waitingCount.decrementAndGet();
}
int index = dirInfo.getIndex();
File[] listing = dirInfo.getListing();
int upperBound = Math.min(index + installmentSize, listing.length);
for (int i = index; i < upperBound; i++) {
if (listing[i].isFile()) {
try {
producedQueue.put(listing[i]);
producedCount.incrementAndGet();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return true;
}
}
if (listing[i].isDirectory()) {
DirInfo subdirInfo = new DirInfo(listing[i]);
try {
workingQueue.put(subdirInfo);
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return true;
}
}
}
if (upperBound != listing.length) {
dirInfo.setIndex(upperBound);
workingQueue.add(dirInfo);
}
thread.useCount += (upperBound - index);
return false;
}
public void close() {
for (ThreadWithList t : threads) {
if (t.isAlive()) {
t.interrupt();
}
}
for (ThreadWithList t : threads) {
try {
t.join();
} catch (InterruptedException exc) {
Thread.currentThread().interrupt();
return;
}
}
workingQueue.clear();
}
}
QueueProcessor
package com.subhajit.util.grep;
import java.lang.reflect.InvocationTargetException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
/**
* Abstract class models a set of threads consuming inputs of type <tt>I</tt>
* from an input queue, and writing outputs of type <tt>O</tt> to an output
* queue.
*
* @author sdasgupta
*
* @param <I>
* @param <O>
*/
public class QueueProcessor<I, O> {
/**
* Private class polls for and extracts messages from
* {@link QueueProcessor#inputQueue}, processes each message using
* {@link QueueProcessor#process(Object)} and places the resulting object
* (if it is not <tt>null</tt>) into {@link QueueProcessor#outputQueue}.
*
* @author sdasgupta
*/
private final class ConsumerRunnable implements Runnable {
public void run() {
while (true) {
try {
final I input = QueueProcessor.this.inputQueue.poll(100,
TimeUnit.MILLISECONDS);
if (input != null) {
pushService.submit(new Runnable() {
public void run() {
try {
O result = processor.process(input);
if (result != null) {
QueueProcessor.this.outputQueue
.put(result);
producedMessageCount.incrementAndGet();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
} catch (InvocationTargetException exc) {
return;
}
}
});
consumedMessageCount.incrementAndGet();
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
}
}
}
}
/**
* The number of threads consuming messages from
* {@link QueueProcessor#inputQueue}.
*/
private final int threadCount;
/**
* The threads consuming messages from {@link QueueProcessor#inputQueue}.
*/
private final List<Thread> consumerThreads;
/**
* The queue from which input messages are read.
*/
private final BlockingQueue<I> inputQueue;
/**
* The queue to which output messages are written.
*/
private final BlockingQueue<O> outputQueue;
/**
* Processes inputs (pulled from the <tt>inputQueue</tt>) and pushes the
* results (obtained by calling {@link #process(Object)}) to the
* <tt>outputQueue</tt>.
*/
private final ExecutorService pushService;
/**
* Counts the total number of consumed messages.
*/
private final AtomicInteger consumedMessageCount = new AtomicInteger(0);
/**
* Counts the total number of produced messages.
*/
private final AtomicInteger producedMessageCount = new AtomicInteger(0);
private final IProcessor<I, O> processor;
/**
* Public constructor.
*
* @param inputQueue
* @param outputQueue
* @param threadCount
*/
public QueueProcessor(BlockingQueue<I> inputQueue,
BlockingQueue<O> outputQueue, int threadCount,
IProcessor<I, O> processor) {
super();
this.inputQueue = inputQueue;
this.outputQueue = outputQueue;
this.threadCount = threadCount;
this.processor = processor;
pushService = Executors.newFixedThreadPool(10);
consumerThreads = new ArrayList<Thread>();
}
// protected abstract O process(I input) throws InterruptedException,
// InvocationTargetException;
public void startup() {
// Create the consumer threads.
for (int i = 0; i < this.threadCount; i++) {
consumerThreads.add(new Thread(new ConsumerRunnable()));
}
for (Thread thread : consumerThreads) {
thread.start();
}
}
/**
* Waits for <tt>inputMessageCount</tt> messages to be processed, invokes
* {@link QueueProcessor#shutdown()}, and returns
* {@link QueueProcessor#producedMessageCount}.
*
* @param inputMessageCount
* @return
* @throws InterruptedException
*/
public int waitFor(int inputMessageCount) throws InterruptedException {
while (true) {
if (consumedMessageCount.get() >= inputMessageCount) {
break;
}
Thread.sleep(100);
}
shutdown();
return producedMessageCount.get();
}
/**
* Issues a shutdown request to the {@link QueueProcessor#pushService},
* waits for that service to shut down, interrupts the
* {@link QueueProcessor#consumerThreads}, waits for those threads to shut
* down, then returns.
*
* <p>
* All messages which have already been consumed are processed.
* </p>
*
* @throws InterruptedException
*/
public void shutdown() throws InterruptedException {
pushService.shutdown();
while (true) {
if (pushService.isTerminated()) {
break;
}
Thread.sleep(100);
}
for (Thread thread : consumerThreads) {
thread.interrupt();
}
for (Thread thread : consumerThreads) {
thread.join();
}
}
}
Grep
package com.subhajit.util.grep;
import java.io.File;
import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.atomic.AtomicInteger;
import com.subhajit.argutils.ArgumentUtils;
import com.sun.idm.svc.util.common.CommonUtils;
import com.sun.idm.svc.util.streams.FileUtils;
public class Grep {
private static class FileContent {
private final File file;
private final String content;
public FileContent(File file, String content) {
super();
this.file = file;
this.content = content;
}
public File getFile() {
return file;
}
public String getContent() {
return content;
}
}
private static enum Argument {
text, dir, pattern
}
private static Map<Argument, ?> parseCommandLineArguments(String[] args)
throws IOException {
final String argumentSpec = "d:string:true:Base directory under which to search|"
+ "f:string:true:Comma separated file name patterns to search (eg. .java,.properties,.xml)|"
+ "i:boolean:false:Ignore case|"
+ "names:boolean:false:Show file names only if \"true\", else show verbose output";
if (args.length == 0) {
System.out
.println("Usage:\nOption\t\tType\tRequired\tDescription\n"
+ ArgumentUtils.getUsage(argumentSpec));
System.exit(1);
}
Map<String, String> map = ArgumentUtils.parseArgs(args, argumentSpec);
if (!map.containsKey(CommonUtils.UNBOUND_ARGUMENT)) {
throw new IllegalArgumentException(
"Cannot continue, since search terms have not been specified.");
}
final String text = map.get(CommonUtils.UNBOUND_ARGUMENT);
final File dir = new File(map.get("d")).getCanonicalFile();
final String pattern = map.get("f");
Map<Argument, Object> ret = new HashMap<Argument, Object>();
ret.put(Argument.dir, dir);
ret.put(Argument.pattern, pattern);
ret.put(Argument.text, text);
return ret;
}
public static void main(String[] args) {
int status = 0;
long t0 = System.nanoTime();
try {
// Parse the command line arguments.
Map<Argument, ?> commandLineArguments = parseCommandLineArguments(args);
final String text = (String) commandLineArguments
.get(Argument.text);
final File dir = (File) commandLineArguments.get(Argument.dir);
final String fileNameInput = (String) commandLineArguments
.get(Argument.pattern);
// Setup the queue of files. This queue is populated by the
// directory scanner with all files and directories found under the
// base directory we are scanning.
BlockingQueue<File> fileQueue = new ArrayBlockingQueue<File>(5000);
final DirectoryScannerProducer scanner = new DirectoryScannerProducer(
fileQueue, 5, 20);
// Setup the content queue. This queue contains FileContent objects
// embodying the content of files read from the fileQueue that match
// the file name pattern we are interested in. This queue is
// populated by the fileReaders object.
final BlockingQueue<FileContent> contentQueue = new LinkedBlockingQueue<FileContent>();
final QueueProcessor<File, FileContent> fileReaders = new QueueProcessor<File, FileContent>(
fileQueue, contentQueue, 5,
new IProcessor<File, FileContent>() {
public FileContent process(File input)
throws InterruptedException,
InvocationTargetException {
try {
if (input.isDirectory()) {
return null;
}
if (!input.getName().endsWith(fileNameInput)) {
return null;
}
return new FileContent(input, new String(
FileUtils.loadFile(input)));
} catch (IOException exc) {
throw new InvocationTargetException(exc);
}
}
});
// Setup the output queue containing FileContent objects
// representing matches. The output queue is populated by the file
// finders object.
final BlockingQueue<FileContent> outputQueue = new LinkedBlockingQueue<FileContent>();
final QueueProcessor<FileContent, FileContent> fileFinders = new QueueProcessor<FileContent, FileContent>(
contentQueue, outputQueue, 5,
new IProcessor<FileContent, FileContent>() {
public FileContent process(FileContent input)
throws InterruptedException,
InvocationTargetException {
if (input.getContent().contains(text)) {
return input;
} else {
return null;
}
}
});
// Start the file readers.
fileReaders.startup();
// Start the file finders.
fileFinders.startup();
// Start the thread that dumps the matching output to the console.
final AtomicInteger dumpedCount = new AtomicInteger(0);
final ExecutorService dumpService = Executors
.newFixedThreadPool(10);
Thread dumpingThread = new Thread(new Runnable() {
public void run() {
try {
while (true) {
final FileContent matchingFileInfo = outputQueue
.take();
dumpService.submit(new Runnable() {
public void run() {
System.out.println(matchingFileInfo
.getFile());
}
});
dumpedCount.incrementAndGet();
}
} catch (InterruptedException exc) {
System.out.println("Done (" + dumpedCount.get() + ")");
Thread.currentThread().interrupt();
return;
}
}
});
dumpingThread.start();
// Start the directory scanner.
int workItemCount0 = scanner.scan(dir);
// Wait for the file readers to process its inputs.
int workItemCount1 = fileReaders.waitFor(workItemCount0);
// Wait for the file finders object to process its inputs.
int workItemCount2 = fileFinders.waitFor(workItemCount1);
// Interrupt the dumping thread once it has printed all output.
while (true) {
if (dumpedCount.get() == workItemCount2) {
dumpingThread.interrupt();
break;
}
Thread.sleep(100);
}
dumpingThread.join();
dumpService.shutdownNow();
} catch (Throwable exc) {
status = 1;
exc.printStackTrace();
} finally {
t0 = System.nanoTime() - t0;
System.out.println((t0 / 1000000) + " ms.");
System.exit(status);
}
}
}
What do you think?
|
Friday Jan 30, 2009
As a follow up to my recent blog posting on long running tasks, here is an example of using the mini-framework described in that post to copy files from one directory to another:
The code above refers to two methods of a "FileUtils" class. The "listAllFilesUnder" method, which returns a List<File> containing all files (not directories) under a given directory and its sub-directories, is shown below:

This method, in turn, refers to a method named "listAllContentsUnder" in FileUtils:

The "FileUtils.copyFile" method, which uses Java NIO, is shown below:
And here is the (obligatory) screenshot showing the progress bar in action:
Enjoy!
I recently dusted off some code I had had written a while back to serve as a mini-framework (Swing based) for long background tasks that updated progress bars as they executed. The code is compact and easy to to use, but lacked a facility to allow the user to cancel the task. I updated the code to provide this facility today, and wanted to share it for review, comment and possible re-use.
The primary design criterion that drove the design of this mini-framework is ease of use. I feel that the end result is indeed easy to use, requiring, in most cases, a one line invocation and an anonymous implementation of an interface (see below).
To set the stage for this discussion, let us review some common characteristics of long running background tasks.
As the name suggests, long running background tasks are "long running": they are slow enough to seriously impact "responsiveness" in applications providing user interactivity. These tasks usually involve some heavy duty local or remote processing, and never involve direct user interaction. Examples of such tasks are searching all files under a given sub directory for occurences of a textual token (which is what "grep -R" does), bulk loading data into a database, or performing an intensive mathematical calculation.
Often, long running background tasks involve a number of iterations performing a simpler task, and the iteration count is often known in advance. For example, the number of files to search, or the number of records to insert, or the number of times a square root must be calculated are often known at the outset of the task. Sometimes, however, an iteration count is not known in advance, perhaps due to the nature of the problem, or perhaps due to the nature of the algorithm (which might be recursive, not iterative).
Having described the type of problem we are trying to solve, here is a quick look at an application that performs a background task (counting up from 0 to 100 in a "for" loop, updating a progress bar as it works):

The calling thread is blocked while the "exec" method runs. A progress bar is shown to the user to indicate progress. The long running task calls "progress.increment()" regularly to update this progress bar. Another thing that the long running task does is check the "interrupted" status of its thread: if the interrupted status indicates that the thread has been interrupted, the long running task interprets this to mean that the user has cancelled the task, and it returns immediately.
In this example, we have a SecureRandom object named "random", and we use it repeatedly to generate a random "sleep time" during each iteration. This is done to simulate the "long running"-ness of this task. Running this in a "main" program gives:

All that the user must do is implement the "ISwingRunnableWithProgress" interface, which has just one method:

The classes used to implement the mini-framework are provided in the supplied source code (see below). The TaskRunner class is the lynchpin of this min-framework. Its "exec" method (see the code sample above) launches the background task and positions the progress bar dialog in the center of the screen.

The TaskRunner class implements the ICancellable interface:

The TaskRunner class adds the current instance as a listener for cancel events to the progress dialog in its constructor:

and it interrupts the background thread running the long running task if its "cancelled" method is invoked (by the progress dialog, if the user presses the Cancel button):

To try out this mini-framework, download gui-progress.jar to a temporary directory and run "java -jar gui-progress.jar". The source code for the mini-framework is available in gui-progress-src.zip. This file contains source code for a number of other, unrelated classes as well, which you are free to ignore.
Saturday Jan 24, 2009
A question about servlets occurred to me just now. As we know them in general use, servlets are HTTP servlets. Put another way, the javax.servlet.Servlet interface has been traditionally used via its javax.servlet.http.HttpServlet implementation. The question that arises is: what about an implementation of java.servlet.Servlet that uses a raw, socket based TCP implementation instead?
Before going into the above question, it is perhaps to pause and ponder possible uses, if any, of such as TCP implementation. After all, in what way would such an implementatoin be useful? More specifically, what would be some compelling benefits of a TCP implementation compared to the traditional HTTP implementation?
Please share your thoughts...
Thursday Jan 22, 2009
Introduction
The serializer is a Java agent that instruments classes as the JVM loads them by marking some of them as serializable. Marking a class serializable involves two steps:
- Adding the "java.io.Serializable" interface to the list of interfaces the class implements.
- Adding a private static final long serialVersionUID variable to the class, and initializing it appropriately.
The classes to be thus marked are specified via a set of system properties. At class load time, the serializer performs the following steps:
- If the class belongs to a handful of "untouchable" packages (such as those with names starting with "java.", "javax.", "com.sun.", "sun.misc.", etc.) it is not instrumented.
- If the class belongs to a set of packages that the user wants instrumented (as specified by the "makeser" system property), and also to a list of classes or sub-packages that the user does not wish to instrument (specified by the "ignoreser" system property), it is not instrumented.
- If the class does not belong to the set of packages that the user wants instrumented, it is not instrumented.
- If the class is an interface, it is not instrumented.
- If the class is abstract, it is not instrumented.
- If all the above checks "pass", the class is instrumented.
Defining the agent
Several things must be done to setup a Java agent (see references). Specifically, an entrypoint class must be created and declared in a manifest file entry (see com.subhajit.serializer.SerializerMain and the META-INF/MANIFEST.MF file provided in the source code). The SerializerMain class:

contains a method named "premain" which acts as the entry point into the agent. In our case, the "premain" method adds a "ClassFileTransformer" instance of the "SerializerTransformer" class (included in the provided source code). The SerializerTransformer class is where the meat of the instrumentation occurs.
The SerializerTransformer class uses two system properties, named "makeser" and "ignoreser" to read comma separated lists of packages which must be, and which must be instrumented, respectively. An example value for the "makeser" system property is "com.mycompany.project1.impl,com.mycompany.project1". The "ignoreser" system property must be declared similarly. If either of these system properties is missing, the SerializerTransformer class treats them as if they were empty strings. (This can have strange effects. If, for example, the "makeser" system property is not defined, no class is instrumented).
SerializerTransformer transformer first checks the class name (against untouchable packages and those that have been declared via the "makeser" and "ignoreser" system properties). It next checks the class bytes (by creating a (BCEL) JavaClass object and getting information about the class via this object). Finally, it instruments class bytes for those classes that it determines should be instrumented. The actual code that performs the instrumentation is surprisingly simple:

Using the serializer
To use the supplied source code, you must first build it using the supplied (ant) build script. The output produced by the build are two files named "dist/serializer.jar" and "dist/serializer.zip". Copy the zip file to a location where you want your applications to find it (eg. "/opt/serializer/" or "c:\ser\serializer\"), and unzip it therein to create three files, namely "serializer.jar", "becl-5.2.jar" and "src.zip". Include the "bcel-5.2.jar" in your application's class path.
Next, modify the launch script of your application in the following manner:
- Add the JVM option: -javaagent:{locationOfSerializerJars}/serializer.jar
- Add the "makeser" system property by setting its value to a comma-separated list of package names that must be instrumented (eg. -Dmakeser=com.mycompany.myproject1,com.mycompany.commonlib).
- Add the "ignoreser" system property by setting its value to a comma separated list of package and class names that must not be instrumentd.
That is it. When your application start up, you would see a number of messages from the serializer (in your standard output) indicating classes it has decided to instrument.
Source code and references
You can download the source code for the serializer here.
References for Java agents:
- 1. http://javahowto.blogspot.com/2006/07/javaagent-option.html
- 2.http://java.sun.com/javase/6/docs/api/java/lang/instrument/Instrumentation.html
Friday Jan 02, 2009
Introduction
In this article, I describe a simple Java expression evaluator thaty evaluates algebraic expressions including a limited number of trigonometric, logarithmic, transcendental and exponential functions. The expression evaluator described here functions by generating and caching custom Java expression evaluation classes per expression. The custom classes implement the following interface:

Expression evaluation classes are generated by a generator class (MathEvalGenerator). The "generate" method of this class accepts a String expression representing the expression to be evaluated, and returns an IMathEval object that evaluates the expression:
The "generate" method has to perform several steps in order to create an IMathEval object for the expression.
Generating evaluators for expressions
The "generate" method of MathEvalGenerator seeks to generate the source code of a class that implements IMathEval, compiling the generated source code, loading the bytes of the resulting CLASS file into an anonymous class loader, and, finally, returning an instance of this class. Let us walk through these steps using an example expression: "sin(x)+cos(x)".
The generator class caches the names of all public methods of the "java.lang.Math" class during (static) initialization. Given an expression (sin(x)+cos(x) in our example), the generater class first replaces all occurences of method names in its cache of appearing in the expression, with the expression "Math."+methodname. For our sample expression (six(x)+cos(x)), the resulting expression becomes "Math.sin(x)+Math.cos(x)".
Next, for performance reasons, the generater class checks an internal cache of classes it has already generated to check if the class for this expression is present. If so, meaning that the generater class has already generated the class for this expression before, it directly instantiates the cached class and returns the evaluation object without further ado.
If the generater class does not find a cahced evaluation class for this expression, it proceeds to generating the source code of the evaluation class. For our example, the source code of the evaluation class looks like this:

Here, note that the class name "Eval0" is automatically generated to ensure uniqueness. Also, note that invoking "new Eval0().eval(0.25,0.5)" returns the value of the expression sin(0.25) + cos(0.5).
Having generated the source code, the generater class compiles it using the new compiler API provided by Java 1.6. The following steps are used to accomplish this:
- Create a temporary directory under the directory returned by "System.getProperty("java.io.tmpdir")".
- Save the generated source code to this directory.
- Export the source code of the IMathEval interface to this directory.
- Invoke the compiler on these two source files.
- Read the contents of the generated CLASS file (for the implementation class).
- Load these into an anonymous class loader.
- Instantiate the class and return the instance.
Critique
While the technique shown here to evaluate expressions is simple, it is probably "too simple". For example, it sometimes requires an "ugly" expression syntax: (pow(sin(x),2)+pow(cos(x),2)) to evaluate "sin^2(x) + cos^2(x)". For another, it is limited to supporting only those expressions that the java.lang.Math class supports out of the box.
It is possible to improve the generator class by allowing it to save (and reload) generated classes between process lifetimes. Thus, the generater class could be extended so that it stored all generated class bytes in a repository (such as a ZIP file), and reloaded these during initialization.
Complete source code for this project is available here.
Introduction
Object serialization and de-serialization is pretty straightforward in Java. One serialiazes objects by creating an ObjectOutputStream and then writing objects to it. One deserializes objects by opening an ObjectInputStream to read from the serialized bytes of the object, followed by reading objects from the ObjectInputStream. All this is well and good, until one encounters the problem, during deserialization, that the objects to be desrialized belong to classes that are known only to a custom class loader. In this case, attempting to deserialize objects results in ClassNotFoundExceptions.
This article shows a technique that allows desrialization of objects belonging to classes defined in custom class loaders. But first, let us take a look at the problem.
The problem
Let us assume that we create a "Point" class to model two dimensional points. The "Point" class is a java bean class containing two "int" fields named "x" and "y" (representing the "x" and "Y" co-ordinates of a two dimensional point, respectively).
We create the 'Point" class dynamically using the technique describe in an earlier blog positing about dynamic java beans. Next, we instantiate a couple of Point objects:
Next, we serialize the point objects p1 and p2 to a byte array:

Having obtained the byte array bytes, we can demonstrate the problem we are trying to solve by attempting to deserialize the objects contained therein using the following code snippet:

To our chagrin, this throws a ClassNotFoundException trying to read the first object. It should be pretty obvious why this exception is thrown: essentially, the objects being read belong to a class ("Point") about which the application class loader knows nothing. This is so because the "Point" class has been defined within the context of a custom class loader.
The solution
What we need during the deserialization process is a way to tell the ObjectInputStream to use a custom class loader which "knows" about the "Point" class. In other words, if we had a class loader with us at the point we invoked the deserialization code above that was able to "find" the "Point" class, we need some way to tell the ObjectInputStream to use this class loader as it attempted to read the objects from the stream. Looking at the deserialization API, however, there does not appear to be an obvious way to do this.
This article uses the following technique to accomplish the solution:
Create a CustomObjectInputStream by extending ObjectInputStream. CustomObjectInputStream accepts an array of custom ClassLoaders to use during deserialization.
Override the "resolveClass" method of ObjectInputStream in CustomObjectInputStream.
The "resolveClass" method accepts an ObjectStreamClass object and returns a Class object corresponding to this object (see the Java doc of ObjectInputStream). CustomObjectStream performs attempts to first use the application class loader top load the class corresponding to the name of the ObjectStreamClass. If this fails, it attempts to use each of its custom class loaders one by one to load the class. If even this fails, the over ridden resolveClass throws a ClassNotFoundException:

Putting it all together
The following code snippet puts all of the above together. Note that we define the functionality of the CustomClassLoader anonymously in the code snippet below:

Source code for this article is provided in the form a JUnit test. It is available for download here.
Wednesday Dec 31, 2008
Stack Trace Beautification
Introduction
Consider the following code snippet showing a "main" method:
public static void main(String[] args) {
int status = 0;
sLog.info("Started");
try {
Integer.parseInt("");
} catch (Throwable exc) {
status = 1;
sLog.warn("Error", exc);
} finally {
System.exit(status);
}
}
This code is guaranteed to throw a "NumberFormatException" at the line "Integer.parseInt("")". The stack trace of this exception looks like this:


This stack trace gives us some basic information that allows us to figure out where the exception was thrown from (Line 18 of the class "com.subhajit.stacktrace.log4j.Test"'s "main" method). Now consider the following stack trace thrown by this same code snippet:
This stack trace shows the line of source code that caused the exception in the context of the surrounding source code. Having this context captured in the source code is useful for several reasons:
1. Developers inspecting the stack trace save the step of correlating line numbers with the source code: the source code surrounding the offending line is available in the stack trace itself.
2. Often, with older releases, line numbers of code in the latest release under development might not match the corresponding line numbers in the older release. Developers faced with a bare stack trace must a) figure out what version of the build caused the stack trace to be thrown, b) Check out the older version from a source code repository and then c) figure out the offending code. Again, having the offending line of source code in the context of its immediate surroundings helps to save time in these situations.
In this article, I show how to accomplish this "stack trace beautification", and provide several techinuqes to integrate this functionality in your code (such as a log4j layout and an aspect based load time weaving launch configuration).
The Basic problem
The basic problems to be solved are the capture and the formatting of exception stack traces as exceptions are thrown in a running system. We will look at the problem of capturing exceptions later: first, we look at what we must do to generate beautified stack traces from exceptions we have already caught.
Beautifying stack traces
The source code provided with this article contains a class named "com.subhajit.stacktrace.base.SourceCodeBeautifier" that performs stack trace beautification. This class depends upon a system property named "src" that provides a comma-separated list of source code locations (eg. "/src,%JAVA_HOME%/src.zip,...). This class examines the StackTraceElement[] of the exception, and, for every class it finds that has source code available, it picks out a bunch of lines of source code before and after the offending line. Then, it indents and formats these lines of source code properly, drawing an "arrow" shape to point at the offending line within this bunch of lines. For classes which do not have source code available, it appends an ordinary message showing the file and line number (if available), just as it would appear in a "normal" stack trace. Finally, it returns the properly formatted stack trace to the caller. See the overrides of the "printStackTrace" methods of this class for details.
A convenience class ("com.subhajit.stacktrace.base.ThrowableHolder2") is provided that can be used as follows:
new ThrowableHolder2(exc).printStackTrace() prints the beautified stack trace of the exception "exc" to System.err
new ThrowableHolder2(exc).printStackTrace(PrintStream ps) prints out the beautified stack trace to the given PrintStream, while new ThrowableHolder2(exc).printStackTrace(PrintWriter pw) prints out beautified stack trace information to the provided PrintWriter.
Capturing exceptions
It is little use having a stack trace beautification scheme unless it is easily integrated into your existing build and release practices. Also, I daresay that the less invasive this integration is (meaning, the less things you have to change to accomodate this scheme), the more readily would you consider using it. Accordingly, I provide two techniques to integrate this scheme into your own projects.
Capturing exceptions using log4j
The first technique assumes that you use log4j for logging, and you already have code and configuration that logs all interesting exceptions. In other words, your project already has a "log4j.properties" file that describes how you log messages, and your project already uses this file to configure log4j logging.
The "com.subhajit.stacktrace.log4j.CustomLayout" class extends the log4j PatternLayout class to provide beautified stack traces. The following snippet of a log4j properties file shows how to use the CustomLayout class:

To have the CustomLayout beautify stack traces, don't forget to define the system property "src" to point to a comma separated list of source code locations (which could be directories or zip files containing source code). On my computer, I use the following command line to run the "stack-trace-log4j.jar" test program:

AspectJ based capture
Using the log4j based scheme above requires changes to your application classpath (to include stack-trace-log4j.jar) and log4j configuration. Using the AspectJ based capture described below, no changes are required to your application components. Capture is accomplished simply by modifying your application launch configuration.
Using the scheme described here does not requires that you have minimal to no understanding of AspectJ (http://www.eclipse.org/aspectj/). All you do is create a "launcher" script that launches your program. The "com.subhajit.stacktrace.aspectj.admin.CommandGenerator" class generates launcher scripts given the following information:
- An output directory where the generated launch scripts are saved, along with some supporting files.
- A comma separated list of source code locations (directories or zip files).
- A comma separated list of classpath elements (JAR files and directories containing CLASS files) required by your application to run.
- The name of the "main class" of your application (the class containing the "main" method).
Running the command generator using the following command line:

results in the creation of a directory named "temp" (specified by the "-dir" argument), containing the following files:

The two launch files generated, named "launch.cmd" and "launch.sh", serve to launch your application on Windows and Solaris/Linux, respectively. The Windows version of this file looks like this (the indented lines at the end of the file are actually part of the line starting with "java -cp ":

In the launch script above:
- ASPECTPATH is the location of the stack-trace-agent.jar file.
- SRCPATH is a comma separated list of directories and zip files containing source files.
- APPCLASSPATH is a list of application classpath elements, such as JAR files and directories containing CLASS files.
This launch script uses AspectJ's "load time weaving" functionality via its "WeavingURLClassLoader". I could not make this work using the new "agentlib" load time weaving mechanism.
Source code
The source code provided with this article contains several Eclipse projects:
| stack-trace-base | Contains base classes to format stack traces with embedded source code snippets. |
| stack-trace | Contains the AspectJ based exception capture mechanism. |
| stack-trace-admin | Contains tools (such as CommandGenerator) to generate AspectJ based launchers utilizing load time weaving to capture exceptions. |
| stack-trace-log4j | Contains a log4j Layout (CustomLayout) and a sample log4j.properties file showing how to update your log4j configuration to format stack traces. |
Source code for this article is available for download.
Conclusion
Capturing and logging exception stack traces containing source code snippets helps developers rapidly identify (or hypothesize about) problems. This article presents a simple mechanism to accomplish the inclusion of source code snippets in exception stack traces.
Monday Dec 29, 2008
Dynamic Java Beans
(Available under http://blogs.sun.com/adventures/)
Introduction
In this article, I show how to dynamically create Java bean classes and load them using application defined custom class loaders. Java beans are useful in a number of application domains [1,2]. Traditionally, Java beans are created by first writing source code for the desired bean and then compiling it. This works best for applications in which the Java bean classes to be used are known before hand. There are some applications, however, that need to create Java beans dynamically, that is, they need to create ad-hoc beans as they run.
There are two tasks to the problem of creating dynamic Java beans at runtime, namely:
- Generating the byte code of the Java bean classes.
- Loading the generated byte code via specified class loaders.
In this exercise, we accomplish these tasks as follows:
- Specify a name for the Java bean to be created (“com.subhajit.synthetic.Point”).
- Specify the fields (names and types) that the Java bean must contain (“x” of type “int” and “y” of type “int”).
- Generate the byte codes of the Java bean class.
- Load these byte codes via a specified class loader.
Here is a snippet for code that shows how the above steps are performed:

Here, “BeanCreator” is the “com.subhajit.superbean.BeanCreator” class which is provided in the source code accompanying this article (see download link below).
Having gone over what we accomplish in the provided source code, let us delve deeper into steps 3 (generating byte code) and 4 (loading byte code).
Generating byte code
Byte code may be generated by several means. With the advent of the compiler API in Java 6 [3], one might generate source code for the intended Java bean, compile it, and obtain the bytes of the resulting class [4]. On the other hand, one might choose to use a byte code generation library (such as ASM [5] or Apache BCEL [6]) to generate byte code directly.
Recognizing that byte code may be generated in a number of different ways, we define an interface named “ClassGenerationStrategy” to hide the actual strategy used to generate the byte code:
You may define your own byte code generation strategy by implementing this interface, and construct an instance of the “BeanCreator” class. Alternatively, you may use the default implementation (“BcelClassGenerator”).
The “generateClassBytes” method of “BcelClassGenerator” shows how to use BCEL to generate byte code for Java beans. There are methods to declare the Java bean class, add a default constructor, add private fields for the desired bean properties and add “getter” and “setter” methods for each property. The code is somewhat involved if (like me) you are not familiar with the JVM's byte code format. If you really want to dig deeper into these steps, see [7,8].
Loading byte code
Generating byte code for a class is well and good, but the byte code by itself of little use unless it can be loaded as a Java class. After all, it is only after we have a Java class is it possible to do useful things like instantiating the class to create new objects, setting up instances with different values, followed by presumably doing something useful with these instances.
The “standard” way to convert a byte array to a Java class is via one of the overloads of the “defineClass” method in the “java.lang.ClassLoader” class. This final method accepts a class name (of the resulting class), a byte array, an offset (within the byte array) and length of bytes (within the byte array, starting at the offset), and returns a Java class (see the Java doc for the “java.lang.ClassLoader” class for details). The “standard” way to get access to the “defineClass” method is to implement a custom class loader class, override the “findClass” method, and call “defineClass” therefrom.
In this exercise, we use a different technique. We modify the user defined URLClassLoader (passed in as a constructor argument to the “BeanCreator” class) by adding a special URL to its list of URL's. Generated byte code that we wish to load with this class loader is placed in this special URL. Subsequent requests to this class loader to load the class causes it to go through its built in class finding and loading mechanism, which ultimately results in the class being found and loaded.
The “special” URL added to the user defined URLClassLoader uses a special protocol (“mem”) made up to facilitate URL's that refer to memory locations instead of directories and files on disk. Basically, URL's using the “mem” protocol are backed by a singleton instance of a concurrent hash map in memory, which maps String names to byte arrays. Information is saved into memory URL's just as it would be saved into any other URL using the following steps:
- Connect to the URL and get an URLConnection object.
- Setup the URLConnection to perform output.
- Obtain an OutputStream from the URLConnection object.
- Write the byte array to the OutputStream.
- Flush and close the OutputStream.
The following code snippet illustrates these steps:

Before delving into the details of implementing this scheme, let us see what this scheme buys us from the perspectives of code that produces (generates) byte code, and code that consumes (uses) the generated byte code. Code that produces byte code “writes” the generated bytes to a special URL. Code that consumes byte code simply “loads” generated classes using the specified URLClassLoader.
To accomplish this scheme, we need to lay some groundwork:
-
Make the “mem” URL protocol “known” to the Java runtime. Note that some protocols like “file” and “http” are built into the standard libraries, allowing the creation of new URL's using these protocols using code such as:
Trying to create an URL using the “mem” protocol results in an error (a MalformedURLException is thrown):
-
Custom URL protocols (such as the “mem” protocol) are supported by user defined “url stream handler factories”, and “registering” instances of these factories to handle specific custom protocols. Accordingly, we create a user defined class named “com.subhajit.memoryurl.MemoryURLStreamHandler” that implements the “java.net.URLStreamHandlerFactory” interface, and registering a default instance thereof using the code:
The “createURLStreamHandler” method of the “URLStreamHandlerfactory” interface is implemented as follows:
The URLStreamHandler returned by this method is an extension of the “java.net.URLConnection” class named “com.subhajit.memoryurl.MemoryURLConnection”.
The “BeanCreator” class modifies the URLClassLoader passed in by the user by appending a memory URL to the end of the URL's managed by the URLClassLoader. This memory URL uses the following scheme to identify classes : “mem://prefix/fullyQualifiedClassNameInJVMFormat.class” (eg. “mem://0/com/subhajit/beans/Point.class”). The prefix is uniquely created per BeanCreator instance so that different URLClassLoaders created by the application cannot “see” classes created on memory URL's added to other user defined URLClassLoader instances.
The following code (of the “setClassBytesInMemoryURL” method of the “BeanCreator” class) shows how a producer of Java byte code “publishes” classes:


In general, using undocumented functionality, especially when it is not intended to be used, is a very bad idea. In this case, we use it simply because it is a means to an end in accomplishing a dynamic activity (loading classes).
The “BeanCreator” class's “defineClass” method defines bean classes with the following twists:
- If the class that is sought to be defined has already been defined, the previously defined class is returned.
- If any of the constituent members of a bean that is being defined belong to classes that are not visible to this BeanCreator's modified class loader, the classes are read as resources and redefined in this class loader so that it can load the generated class.
Examples of use
The “com.subhajit.superbean.test.TestSuite” class illustrates the use of this code generation scheme. Some examples of use are presented below.
The following code snippet shows how to create a “Point” class containing two integer properties named “x” and “y”, followed by creating a “Line” class containing two “Point” properties named “start” and “end”:

References
-
http://java.sun.com/javase/technologies/desktop/javabeans/index.jsp
-
http://java.sun.com/javase/6/docs/technotes/guides/javac/index.html
-
http://www.juixe.com/techknow/index.php/2006/12/13/java-se-6-compiler-api/
-
http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html
-
http://en.wikipedia.org/wiki/JVM
Code download
http://blogs.sun.com/adventures/resource/beancreator.zip
Friday Oct 31, 2008
This blog copyright 2009 by Subhajit Dasgupta

