Background
Last week I learned that there was more to Java class literals than I
expected - and not in a good way. If you're working in Java SE you can
skip reading now, but in Java ME this might be of interest...
Class literals were added to the Java language in 1.1 days - they
allow you to get a reference to a Class object as if any class had a
static field called "class". So you can get the Class object for the
String class by using the expression String.class instead of
calling Class.forName and dealing with exceptions.
This is a great convenience - not for avoiding some try-catches
(you could use a helper method for that) - because the class name is
checked at compile-time instead of run-time. If you change the name of
a class everywhere but in the string passed to Class.forName, javac won't
be any help.
Issue
In Java 1.5 the JVM Spec was updated to include direct support for
class literals (extending the
ldc bytecode). But how did
class literals worked in Java 1.1 through 1.4? And how do they work in Java ME
CLDC
now?
It turns out that javac just keeps generating code until it works :-)
There are a couple of variations in what it does depending on the Java
target version used, but I'll describe what I've seen for -target 1.2
or 1.3 (which includes most (all?) Java ME CLDC applications).
Consider this trivial class using a Class Literal:
public class ClassTest {
static Class c2 = String.class;
}
Javac will generate code roughly like this:
public class ClassTest {
static Class c2;
static Class class$java$lang$String; // cache (would make sense with a different example)
static {
if (class$java$lang$String == null) {
class$java$lang$String = class$("java.lang.String");
}
c2 = class$java$lang$String;
}
static Class class$(String name) {
try {
return Class.forName(name);
} catch (ClassNotFoundException ex) {
Exception newex = new NoClassDefFoundError(ex.getMessage());
}
}
}
Bigger Issue
OK, so this is getting a little messy - the simple expression
String.class is generating about 33 bytes of bytecode plus
overhead for an anonymous field and method. But what happens when
the class literal is referenced from an interface instead of a class?
Note that interfaces can't have static methods, so javac generates an
anonymous class to hold the anonymous method!
public interface InterfaceTest {
static Class c2 = String.class;
}
Javac will generate code roughly like this:
public interface InterfaceTest {
static Class c2;
static {
if (InterfaceTest$1.class$java$lang$String == null) {
(InterfaceTest$1.class$java$lang$String = class$("java.lang.String");
}
c2 = (InterfaceTest$1.class$java$lang$String;
}
}
public class InterfaceTest$1 {
static Class class$java$lang$String;
static Class class$(String name) {
try {
return Class.forName(name);
} catch (ClassNotFoundException ex) {
Exception newex = new NoClassDefFoundError(ex.getMessage());
}
}
}
This results in the same amount of bytecode overhead, but now
we have extra class overhead. The class file for the anonymous class
InterfaceTest$1 is about 587 bytes. Now multiply this by
every interface that references a class literal.
Java Native Access
So why would anyone have interfaces that references a class literals?
Check out the
Java Native Access
(JNA) project. JNA is an interesting system that lets you declare
native functions you'd like to call from Java, and JNA takes care of generating
a callout,
without generating any JNI
code. I'll be blogging about JNA in the future.
The typical JNA usage pattern is to declare an interface that
contains the functions you want to call (as interface methods), and
JNA creates an implementation that will actually do
the call. To match the interface and implementation to a dynamic
library containing the native code, the interface should declare a
static variable such as:
public interface CLibrary extends Library {
CLibrary INSTANCE = (CLibrary)
Native.loadLibrary((Platform.isWindows() ? "msvcrt" : "c"),
CLibrary.class);
void printf(String format, Object... args);
}
So What? And Where's the Squawk Connection?
OK, to recap, in Java 1.5 and later, this doesn't happen because the
JVM can handle class literals. But in a
Java ME system with tight memory constraints, such as what Squawk is
designed for, this could add up.
But the Squawk VM handles class literals just fine, it's just that javac doesn't
know that. In the process of converting normal Java bytecodes
into Squawk bytecodes, we can do a lot of cleanup. The simplest trick
that comes to mind is to pattern match in the static initializer
"clinit", looking for the initialization of the anonymous field with
a call to the anonymous method.
If we simply replace the call to class$("java.lang.String")
with a load of the class literal, then the anonymous method will not
be called, and the dead method elimination (DME) phase will delete the
method.
Getting rid of the anonymous class is trickier than I first
thought, because the cache variable ("class$java$lang$String") gets
declared in the anonymous class. In theory we can handle this by
constant propagation - the cache field is final, and is really
initialized by a constant value, so we can replace references to
the cache with the load of the constant class litereal. If we implemented dead field elimination there
would be no references to any fields or methods of the anonymous
class and dead class elimination (DCE) could do it's job!
I've got to get dead class elimination checked in one of these days, but in the meantime I now know where all of these anonymous classes are coming from.
ps. I'd like to thank openjdk.java.net for having the sources for everything, in particular Lower.java.