Neal directed my attention to Laird Nelson's struggle with generics.
Many people have found themselves in a similar situation and I can certainly empathize. Generally speaking, the problem is the need to refer to the type of the current class. This is called self-types or the type of this. But I am getting ahead of myself, let us first examine if there are ways to solve the problem in the current language.
Software development starts with the design phase. This is when you design your software in a small group by the whiteboard and draw informal diagrams. The boxes you draw on the whiteboard represents concepts in the application domain and will eventually result in classes and interfaces. You may also draw lines between concepts that are related. At this time you are not too concerned about how to implement the behavior but how the concepts are related.
The design phase is the ideal time to decide what type variables you need. In most cases you will need a type parameter to when a class aggregates (behaves as a collection of) other objects of varying types depending on usage. For example, consider event handlers: if you design a general event handler that can handle all sorts of events this event handler does not need to be parameterized with the type of event it handles. On the other hand, if you have multiple kinds of event handlers, one for mouse events, one for keyboard events, and one for timer events, then it may make sense to have a single event handler class that is parameterized with the event type.
Ignore how to implement the classes. If a type variable does not make sense on the whiteboard, it does not belong in your code.
It does not matter if you are new to generic types, a brilliant type theoretician, or a recovering C++ template meta-programming addict: it is too easy to use type variable for things they are not suited for!
This lesson is particularly hard to learn if you are used to C++ templates. Generic types are not templates and you should not use type parameters for implementation convenience.
Laird Nelson describes a scenario with objects, references, and adapters. Objects can be canonically identified and references can be dereferenced. I do not know why references are needed and I would personally go for something simpler. However, I will assume that there are good reasons for having all the classes and interfaces described by Laird Nelson but note that I do not fully understand the purpose of all of them.
The goal is to be able to reference objects type-safely as illustrated by this example:
Person p = null; Reference</* what goes here? */> ref = p.getCanonicalReference(); Person p2 = ref.dereference();
So I put all the above theory to the test and started drawing
Laird Nelson's example on my whiteboard. Types like
Dereferenceable and Reference are naturally
parameterized to specify the type of what they
reference
. Similarly, a BaseObjectAdapter can be
parameterized over what it adapts. I am not so sure about
CanonicallyIdentified so I chose not to parameterize it. It
is easy to do so if it makes sense, though. Since I do not know
anything about BaseObject I do not see a need to add any
parameters. Certainly neither a Party nor a Person
are parameterized.
So after the design phase, it could look like this:
class DereferenceException extends Exception {} interface Dereferenceable<T extends BaseObject> {} class Reference<T extends BaseObject> implements Dereferenceable<T> {} interface CanonicallyIdentified {} interface BaseObject extends CanonicallyIdentified {} interface Party extends BaseObject {} interface Person extends Party {} class BaseObjectAdapter<T extends BaseObject> implements BaseObject {}
Compare to where Laird Nelson gave up:
class DereferenceException extends Exception {} interface Dereferenceable<T extends BaseObject> {} class Reference<T extends BaseObject> implements Dereferenceable<T> {} interface CanonicallyIdentified {} interface BaseObject extends CanonicallyIdentified {} interface Party extends BaseObject {} interface Person extends Party {} class BaseObjectAdapter<T extends BaseObject> implements BaseObject {} My suggestion |
class DereferenceException extends Exception {} interface Dereferenceable<T extends BaseObject<T>> {} class Reference<T extends BaseObject<T>> implements Dereferenceable<T> {} interface CanonicallyIdentified /* crap. */ {} interface BaseObject<T extends BaseObject<T>> extends CanonicallyIdentified<T> {} interface Party<T extends BaseObject<T>> extends BaseObject<T> {} interface Person<T extends Person<T>> extends Party<T> {} class BaseObjectAdapter<T extends BaseObject<T>> implements BaseObject<T> {} Laird Nelson's original example after generics |
During the design phase implementation details are not too
important. However, once the design is mature we do need to worry
about how to implement it? There are a few tricks to learn
but most of it is straightforward, at least
Dereferenceable, and Reference are:
interface Dereferenceable<T extends BaseObject> { T dereference() throws DereferenceException; } class Reference<T extends BaseObject> implements Dereferenceable<T> { public T dereference() throws DereferenceException { return null; // or something } }
CanonicallyIdentified may not seem as straightforward and
what about Laird Nelson's example:
Person p = null; Reference</* what goes here? */> ref = p.getCanonicalReference(); Person p2 = ref.dereference();
So clearly, I must parameterize CanonicallyIdentified?
No. I was unsure at the whiteboard and decided not to add
parameterize CanonicallyIdentified and it is not used in
the example anyway. I will use a wildcard:
interface CanonicallyIdentified { Reference<?> getCanonicalReference(); }
On the other hand, had I decided (at the whiteboard) that it did
make sense to parameterize CanonicallyIdentified it could
look like this:
interface CanonicallyIdentified<T> { Reference<? extends T> getCanonicalReference(); }
This decision is based on the the design of the object
hierarchies, not on where types flow. BaseObject looks like
this:
interface BaseObject extends CanonicallyIdentified {}
If I chose to parameterize CanonicallyIdentified, it would
look like this:
interface BaseObject extends CanonicallyIdentified<BaseObject> {}
BaseObject is not parameterized and should not be.
Party and Person are not parameterized either but we
do want to know the type of the canonical reference. The answer is
to override the method and specialize the return type
(covariant return types):
interface Party extends BaseObject { String getSortName(); // or whatever Reference<? extends Party> getCanonicalReference(); } interface Person extends Party { Reference<? extends Person> getCanonicalReference(); }
This also answers the question about what type argument to use in the example above:
Person p = null; Reference<? extends Person> ref = p.getCanonicalReference(); Person p2 = ref.dereference();
Finally, BaseObjectAdapter:
class BaseObjectAdapter<T extends BaseObject> implements BaseObject { /* various instance fields... */ private Reference<T> canonicalReference; // with the usual getters and setters public Reference<T> getCanonicalReference() { return canonicalReference; } }
Is there a lesson to learn from this? Generic types are not
C++ templates. Design your types to have the type
parameters they naturally need and do not add unnecessary type
parameters to save yourself from typing.
Joe
and I have been talking about best practices when using
generics for software design at JavaOne 2005 and 2006. We recommend that you try
to avoid unnecessary type variables.
Sometimes
the solution is to not use generics.
Type parameters on generic methods are different. However,
not too much. Never use type parameters on public methods if
they only benefit the implementation. Instead use a wildcard
and a private generic method if you need to name a type when
implementing the behavior. For example, consider how to implement
Collections.reverse:
public static void reverse(List<?> list) { reverse0(list); } private static <T> void reverse0(List<T> list) { ListIterator<T> fwd = list.listIterator(); ListIterator<T> rev = list.listIterator(list.size()); for (int i=0, mid=list.size()>>1; i<mid; i++) { T tmp = fwd.next(); fwd.set(rev.previous()); rev.set(tmp); } }
What about self-types? The Java™ programming language does not have self-types
right now.
Self-types
would allow you to refer to the type of the receiver (the type
of this, the current class). Imagine we used
this to denote a self-type and we could avoid
overriding getCanonicalReference:
interface CanonicallyIdentified { Reference<? extends this> getCanonicalReference(); } interface Party extends BaseObject { String getSortName(); // or whatever } interface Person extends Party {}
Compare this to when I used covariant return types:
interface CanonicallyIdentified { Reference<? extends this> getCanonicalReference(); } interface Party extends BaseObject { String getSortName(); // or whatever } interface Person extends Party {} With self-types |
interface CanonicallyIdentified { Reference<?> getCanonicalReference(); } interface Party extends BaseObject { String getSortName(); // or whatever Reference<? extends Party> getCanonicalReference(); } interface Person extends Party { Reference<? extends Person> getCanonicalReference(); } Without self-types |
The Strongtalk type system for Smalltalk
has self-types. You can download the Strongtalk system from
www.strongtalk.org.
The
LOOJ paper by Bruce and Foster
includes a proposal for adding ThisClass to the Java
programming language.
If self-types were added to the Java programming language, it
would be obvious to consider retrofitting this onto
Object.clone():
protected native this clone() throws CloneNotSupportedException;
Unfortunately, this is not possible because the specification of that method does not require:
x.clone().getClass() == x.getClass()
It is only recommended and such a change could then break existing programs that follow the specification. Although we sometimes have to break source compatibility, breaking programs that follow the specification is not a viable option. In the situation where new API is defined or the subtypes of a class are controlled, it is possible to take advantage of self-types on the clone method:
class NewClass implements Cloneable {
protected this clone() {
return (this)super.clone(); // cast required as we cannot retrofit Object.clone()
}
}
Since it is already possible to use covariant return types to
simulate this behavior today and we cannot retrofit
Object.clone() I consider it unlikely that we will add
self-types to the Java programming language anytime soon.
Joseph D. Darcy and Alex Buckley provided me with a lot of useful feedback on the early drafts and helped me get the flow better.
Posted by quintesse on November 16, 2006 at 02:08 AM PST #
Posted by Peter von der Ahé on November 16, 2006 at 02:35 AM PST #
Posted by rullrich on November 16, 2006 at 06:44 AM PST #
Posted by Eugene Vigdorchik on November 16, 2006 at 10:57 AM PST #
Posted by Peter von der Ahé on November 16, 2006 at 11:27 AM PST #
Posted by Eugene Vigdorchik on November 16, 2006 at 11:35 AM PST #
Hi,
clone() is not the only method where a 'self' is needed,
at least in the JDK,
Throwable.initCause() exibits the same problem.
And perhaps more usefull, javax.swing.tree.TreeNode
can be retrofit to something like that :
interface TreeNode {
int getChildCount();
this getChildAt(int index);
}
I wonder if self (or this) can be used as a type argument, by example,
class OhOh implements Comparable<this> {
}
Rémi Forax
Posted by Peter von der Ahé on November 20, 2006 at 07:02 PM PST #
I wrote an entry about a way to restrict a type variable
to the types that can be used as return type.
http://weblogs.java.net/blog/forax/archive/2006/11/javalangunreach.html
Posted by Rémi Forax on November 21, 2006 at 02:21 AM PST #
abstract class Renderer<C extends UIComponent>
{
public abstract void encode(C component);
...
}
public class TextFieldRenderer extends Renderer<TextField>
{
public void encode(TextField component) { ... }
}
Posted by Karl Peterbauer on November 23, 2006 at 03:47 PM PST #
Posted by Laird Nelson on November 30, 2006 at 08:11 AM PST #
Posted by Quite Improbable But Not Quite Impossible on December 04, 2006 at 02:07 AM PST #
Thanks for writing about this topic. Generics are still a black hole of knowledge for most of us I fear.
I would like to raise another use case for the this generic self-type. With regular mutable classes, set methods return void, so method return types are not an issue when you subclass. However, with immutable classes, the equivalent to set methods (with methods) need to return the type of the subclass (not the superclass where the method is declared).
public class AbstractDate {
public int getYear() ...
public int getMonth() ...
public int getDay() ...
public this withYear(int y) ...
public this withMonth(int m) ...
public this withDay(int d) ...
public abstract this newInstance(int y, int m, int d);
}
public class GregorianDate extends AbstractDate {
public this newInstance(int y, int m, int d) ...
}
public class BuddhistDate extends AbstractDate {
public this newInstance(int y, int m, int d) ...
}
public class CopticDate extends AbstractDate {
public this newInstance(int y, int m, int d) ...
}
I've tried to keep the example simple, but I hope that the point is clear. With immutable classes, you need a self-type. Otherwise, you have to cut and paste each of the with methods into every subclass just to manually override the covariant return type. My argument is that as immutable classes are becoming more and more important over time - notably with multi-core CPUs - and so we should be extending generics to support immutable classes properly.
Posted by Stephen Colebourne on December 04, 2006 at 05:34 AM PST #
There I tried to make the case for why selftype is a "must have" for reusable generic code, not a "nice to have."
Essentially, the self type pattern requires "abstract" and "concrete" classes. Abstract self-types can be extended. Concrete self-types cannot be extended. Imagine trying to use Java or any other OO language where you were only allowed to extend abstract classes, and not concrete ones. Covariant return does not solve this problem.
It seems that lots of people keep running up against this problem, and some people keep dancing around it, almost hoping to make it go away. The problem is real, and the self-type pattern is a bandaid at best. I feel that direct support for self-type is a necessary Java enhancement for reusable generic code. If someone has a complete solution to the self-type problem using the current state of generics I would appreciate hearing about it. In the meantime, the workaround I use, which I call the raw-type pattern (which is kind of ugly but it works) is described in the bug entry.
Posted by Jon Barrilleaux on December 11, 2006 at 09:13 PM PST #
Posted by Strider80 on December 21, 2006 at 11:22 AM PST #
Posted by asd on December 28, 2006 at 04:47 AM PST #