Search

Categories

Links

Referers

Self types (aka type of this)

Nov 16 2006, 01:23:45 AM PST »Java»Best Practices Comments [16]

Neal directed my attention to Laird Nelson's struggle with generics.

Many people have found themselves in a similar situation and I can certainly empathize. Generally speaking, the problem is the need to refer to the type of the current class. This is called self-types or the type of this. But I am getting ahead of myself, let us first examine if there are ways to solve the problem in the current language.

Design with Generic Types

Software development starts with the design phase. This is when you design your software in a small group by the whiteboard and draw informal diagrams. The boxes you draw on the whiteboard represents concepts in the application domain and will eventually result in classes and interfaces. You may also draw lines between concepts that are related. At this time you are not too concerned about how to implement the behavior but how the concepts are related.

The design phase is the ideal time to decide what type variables you need. In most cases you will need a type parameter to when a class aggregates (behaves as a collection of) other objects of varying types depending on usage. For example, consider event handlers: if you design a general event handler that can handle all sorts of events this event handler does not need to be parameterized with the type of event it handles. On the other hand, if you have multiple kinds of event handlers, one for mouse events, one for keyboard events, and one for timer events, then it may make sense to have a single event handler class that is parameterized with the event type.

Ignore how to implement the classes. If a type variable does not make sense on the whiteboard, it does not belong in your code.

It does not matter if you are new to generic types, a brilliant type theoretician, or a recovering C++ template meta-programming addict: it is too easy to use type variable for things they are not suited for!

This lesson is particularly hard to learn if you are used to C++ templates. Generic types are not templates and you should not use type parameters for implementation convenience.

Laird Nelson's Scenario

Laird Nelson describes a scenario with objects, references, and adapters. Objects can be canonically identified and references can be dereferenced. I do not know why references are needed and I would personally go for something simpler. However, I will assume that there are good reasons for having all the classes and interfaces described by Laird Nelson but note that I do not fully understand the purpose of all of them.

The goal is to be able to reference objects type-safely as illustrated by this example:

Person p = null;
Reference</* what goes here? */> ref = p.getCanonicalReference();
Person p2 = ref.dereference();

My Design

So I put all the above theory to the test and started drawing Laird Nelson's example on my whiteboard. Types like Dereferenceable and Reference are naturally parameterized to specify the type of what they reference. Similarly, a BaseObjectAdapter can be parameterized over what it adapts. I am not so sure about CanonicallyIdentified so I chose not to parameterize it. It is easy to do so if it makes sense, though. Since I do not know anything about BaseObject I do not see a need to add any parameters. Certainly neither a Party nor a Person are parameterized.

So after the design phase, it could look like this:

class DereferenceException extends Exception {}

interface Dereferenceable<T extends BaseObject> {}

class Reference<T extends BaseObject> implements Dereferenceable<T> {}

interface CanonicallyIdentified {}

interface BaseObject extends CanonicallyIdentified {}

interface Party extends BaseObject {}

interface Person extends Party {}

class BaseObjectAdapter<T extends BaseObject> implements BaseObject {}

Compare to where Laird Nelson gave up:

class DereferenceException extends Exception {}

interface Dereferenceable<T extends BaseObject> {}

class Reference<T extends BaseObject>
    implements Dereferenceable<T> {}

interface CanonicallyIdentified {}

interface BaseObject extends CanonicallyIdentified {}


interface Party extends BaseObject {}


interface Person extends Party {}


class BaseObjectAdapter<T extends BaseObject>
    implements BaseObject {}

My suggestion

class DereferenceException extends Exception {}

interface Dereferenceable<T extends BaseObject<T>> {}

class Reference<T extends BaseObject<T>>
   implements Dereferenceable<T> {}

interface CanonicallyIdentified /* crap. */ {}

interface BaseObject<T extends BaseObject<T>>
    extends CanonicallyIdentified<T> {}

interface Party<T extends BaseObject<T>>
    extends BaseObject<T> {}

interface Person<T extends Person<T>>
    extends Party<T> {}

class BaseObjectAdapter<T extends BaseObject<T>>
    implements BaseObject<T> {}

Laird Nelson's original example after generics

Implementing My Design

During the design phase implementation details are not too important. However, once the design is mature we do need to worry about how to implement it? There are a few tricks to learn but most of it is straightforward, at least Dereferenceable, and Reference are:

interface Dereferenceable<T extends BaseObject> {
    T dereference() throws DereferenceException;
}

class Reference<T extends BaseObject> implements Dereferenceable<T> {
    public T dereference() throws DereferenceException {
        return null; // or something
    }
}

CanonicallyIdentified may not seem as straightforward and what about Laird Nelson's example:

Person p = null;
Reference</* what goes here? */> ref = p.getCanonicalReference();
Person p2 = ref.dereference();

So clearly, I must parameterize CanonicallyIdentified? No. I was unsure at the whiteboard and decided not to add parameterize CanonicallyIdentified and it is not used in the example anyway. I will use a wildcard:

interface CanonicallyIdentified {
    Reference<?> getCanonicalReference();
}

On the other hand, had I decided (at the whiteboard) that it did make sense to parameterize CanonicallyIdentified it could look like this:

interface CanonicallyIdentified<T> {
    Reference<? extends T> getCanonicalReference();
}

This decision is based on the the design of the object hierarchies, not on where types flow. BaseObject looks like this:

interface BaseObject extends CanonicallyIdentified {}

If I chose to parameterize CanonicallyIdentified, it would look like this:

interface BaseObject extends CanonicallyIdentified<BaseObject> {}

BaseObject is not parameterized and should not be.

Party and Person are not parameterized either but we do want to know the type of the canonical reference. The answer is to override the method and specialize the return type (covariant return types):

interface Party extends BaseObject {
    String getSortName(); // or whatever
    Reference<? extends Party> getCanonicalReference();
}

interface Person extends Party {
    Reference<? extends Person> getCanonicalReference();
}

This also answers the question about what type argument to use in the example above:

Person p = null;
Reference<? extends Person> ref = p.getCanonicalReference();
Person p2 = ref.dereference();

Finally, BaseObjectAdapter:

class BaseObjectAdapter<T extends BaseObject> implements BaseObject {
    /* various instance fields... */
    private Reference<T> canonicalReference; // with the usual getters and setters
    public Reference<T> getCanonicalReference() {
        return canonicalReference;
    }
}

Is there a lesson to learn from this? Generic types are not C++ templates. Design your types to have the type parameters they naturally need and do not add unnecessary type parameters to save yourself from typing. Joe and I have been talking about best practices when using generics for software design at JavaOne 2005 and 2006. We recommend that you try to avoid unnecessary type variables. Sometimes the solution is to not use generics.

Generic Methods

Type parameters on generic methods are different. However, not too much. Never use type parameters on public methods if they only benefit the implementation. Instead use a wildcard and a private generic method if you need to name a type when implementing the behavior. For example, consider how to implement Collections.reverse:

public static void reverse(List<?> list) {
    reverse0(list);
}
private static <T> void reverse0(List<T> list) {
    ListIterator<T> fwd = list.listIterator();
    ListIterator<T> rev = list.listIterator(list.size());
    for (int i=0, mid=list.size()>>1; i<mid; i++) {
        T tmp = fwd.next();
        fwd.set(rev.previous());
        rev.set(tmp);
    }
}

Self-Types

What about self-types? The Java™ programming language does not have self-types right now. Self-types would allow you to refer to the type of the receiver (the type of this, the current class). Imagine we used this to denote a self-type and we could avoid overriding getCanonicalReference:

interface CanonicallyIdentified {
    Reference<? extends this> getCanonicalReference();
}

interface Party extends BaseObject {
    String getSortName(); // or whatever
}

interface Person extends Party {}

Compare this to when I used covariant return types:

interface CanonicallyIdentified {
    Reference<? extends this> getCanonicalReference();
}

interface Party extends BaseObject {
    String getSortName(); // or whatever
}


interface Person extends Party {}

 

With self-types

interface CanonicallyIdentified {
    Reference<?> getCanonicalReference();
}

interface Party extends BaseObject {
    String getSortName(); // or whatever
    Reference<? extends Party> getCanonicalReference();
}

interface Person extends Party {
    Reference<? extends Person> getCanonicalReference();
}

Without self-types

The Strongtalk type system for Smalltalk has self-types. You can download the Strongtalk system from www.strongtalk.org. The LOOJ paper by Bruce and Foster includes a proposal for adding ThisClass to the Java programming language.

If self-types were added to the Java programming language, it would be obvious to consider retrofitting this onto Object.clone():

protected native this clone() throws CloneNotSupportedException;

Unfortunately, this is not possible because the specification of that method does not require:

x.clone().getClass() == x.getClass()

It is only recommended and such a change could then break existing programs that follow the specification. Although we sometimes have to break source compatibility, breaking programs that follow the specification is not a viable option. In the situation where new API is defined or the subtypes of a class are controlled, it is possible to take advantage of self-types on the clone method:

class NewClass implements Cloneable {
    protected this clone() {
        return (this)super.clone(); // cast required as we cannot retrofit Object.clone()
    }
}

Since it is already possible to use covariant return types to simulate this behavior today and we cannot retrofit Object.clone() I consider it unlikely that we will add self-types to the Java programming language anytime soon.

Acknowledgments

Joseph D. Darcy and Alex Buckley provided me with a lot of useful feedback on the early drafts and helped me get the flow better.

Post a Comment:
Comments are closed for this entry.
Comments:

Nice article, it does make things a bit clearer for me and it confirms what I suspected myself at times: that I was trying to make things too complicated.

I do have one question though: would it be possible to expand a bit more on the "Generic Methods"? I don't really understand why the reverse() is best implemented the way you show it. And what you mean by the statement that it should not "only benefit the implementation".

Posted by quintesse on November 16, 2006 at 02:08 AM PST #

The short answer is that type variables are part of the API and there is no need to burden clients of your API with unnecessary implementation details. I'll try to elaborate some more at a later time.

Posted by Peter von der Ahé on November 16, 2006 at 02:35 AM PST #

@quintesse: regarding reverse(). If you look at "<T> void reverse(List<T> l) {...}" then you will see, that the type parameter T occurs only once within the method signature, that is, is is not used to link the types of two or more arguments or an argument and the return type together. It is only there, so you can reference it within the implementation. That means it "only benefits the implementation". As a rule of thumb: All type parameters on generic methods, that occur only once within the signature, should be replaced by wildcards. If this doesn't allow you to write your implementation as you wanted to, then implement it within a private method that uses this "superfluous" type parameter. (But because it is private, no one outside your code will see this, that is, it does not become part of the API.)

Posted by rullrich on November 16, 2006 at 06:44 AM PST #

Peter, You should not underestimate the value of self types for user programs. Many times I wished there were self types present when looking into our code. Also from jdk side not only clone() would be the candidate to retrofit, but also getClass() (if only you could solve reification problem), and probably many other methods.

Posted by Eugene Vigdorchik on November 16, 2006 at 10:57 AM PST #

Eugene, as I explained, <code>Object.clone()</code> cannot take advantage of this feature. <code>Object.getClass()</code> already returns the most precise type it can (almost, see RFE 6184881). All that being said, I would personally like to have self types but I also think that it somewhat of a corner case. In my opinion, this is a "nice to have", not a "must have".

Posted by Peter von der Ahé on November 16, 2006 at 11:27 AM PST #

As it seems here, getClass() returns what it does because of a hack (stated in JLS), it would make much sense to extend the type system instead of writing obscure javadoc. Also I've encountered too many cases to consider the situation uncommon. So to me you should reconsider self types for the next version, at least together with other features that are now actively proposed.

Posted by Eugene Vigdorchik on November 16, 2006 at 11:35 AM PST #

Hi,
clone() is not the only method where a 'self' is needed, at least in the JDK, Throwable.initCause() exibits the same problem.
And perhaps more usefull, javax.swing.tree.TreeNode can be retrofit to something like that :

interface TreeNode {
  int getChildCount();
  this getChildAt(int index);
}

I wonder if self (or this) can be used as a type argument, by example,

 
class OhOh implements Comparable<this> {
}

Rémi Forax

Posted by Forax on November 20, 2006 at 12:04 PM PST #

Rémi, you make a good point about Throwable.initCause. That method's specification is very clear and retrofitting this to its return type would work.
Unfortunately, no such luck for TreeNode.getChildAt which has a vague specification. If you have a complex Composite(163) tree structure such as com.sun.source.tree.*, then children may not have the same type as their parent node.
Your OhOh example would not be allowed. The this type variable is only type-safe in covariant places, such as return types.

Posted by Peter von der Ahé on November 20, 2006 at 07:02 PM PST #

I wrote an entry about a way to restrict a type variable to the types that can be used as return type.
http://weblogs.java.net/blog/forax/archive/2006/11/javalangunreach.html

Rémi

Posted by Rémi Forax on November 21, 2006 at 02:21 AM PST #

Peter, may I introduce a very natural "real world" example? In JSF, the abstract class UIComponent is the mother of all components. Low-level visualization by encoding the stuff into HTML is done by instances of class Renderer.
Renderers know how to encode components of a certain type. Omitting details, the Renderer has methods like encode(UIComponent comp). Vice versa, the UIComponent knows it's renderer, so the UIComponent has methods like (again omitting details) getRenderer().
Now let's rewrite this API using Generics. It may be straight-forward for the Renderer:
  abstract class Renderer<C extends UIComponent>
  {
      public abstract void encode(C component);

      ...
  }

A concrete Renderer for, say, TextField components would be defined as
  public class TextFieldRenderer extends Renderer<TextField> 
  {
    public void encode(TextField component) { ... }
  }

My question: How do I define UIComponent, it's getRenderer() and (which is not defined by JSF) a setRenderer(Renderer renderer) in a typ-safe way without self-types? I'm not type theoretician, but a seasoned Java developer, so I really would appreciate a straight-forward way of declaring this natural depedency between the two entities in a type-safe way. Any ideas?

Posted by Karl Peterbauer on November 23, 2006 at 03:47 PM PST #

Thanks for blogging about this, Peter. The best takeaway for my small brain here was the use of covariant return types, which is not really mentioned in this way in Gilad's tutorial. Thanks again.

Posted by Laird Nelson on November 30, 2006 at 08:11 AM PST #

[Trackback] There's something that has been bugging me for a while, and Peter von der Ah&eacute;'s recent post on Java's lack of 'self types' has got me thinking about it again. Take a look at this code: interface Wrapped<T> { T unwrap(); } class...

Posted by Quite Improbable But Not Quite Impossible on December 04, 2006 at 02:07 AM PST #

Thanks for writing about this topic. Generics are still a black hole of knowledge for most of us I fear.

I would like to raise another use case for the this generic self-type. With regular mutable classes, set methods return void, so method return types are not an issue when you subclass. However, with immutable classes, the equivalent to set methods (with methods) need to return the type of the subclass (not the superclass where the method is declared).

public class AbstractDate {
  public int getYear() ...
  public int getMonth() ...
  public int getDay() ...
  public this withYear(int y) ...
  public this withMonth(int m) ...
  public this withDay(int d) ...
  public abstract this newInstance(int y, int m, int d);
}
public class GregorianDate extends AbstractDate {
  public this newInstance(int y, int m, int d) ...
}
public class BuddhistDate extends AbstractDate {
  public this newInstance(int y, int m, int d) ...
}
public class CopticDate extends AbstractDate {
  public this newInstance(int y, int m, int d) ...
}

I've tried to keep the example simple, but I hope that the point is clear. With immutable classes, you need a self-type. Otherwise, you have to cut and paste each of the with methods into every subclass just to manually override the covariant return type. My argument is that as immutable classes are becoming more and more important over time - notably with multi-core CPUs - and so we should be extending generics to support immutable classes properly.

Posted by Stephen Colebourne on December 04, 2006 at 05:34 AM PST #

Take a look at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6479372.

There I tried to make the case for why selftype is a "must have" for reusable generic code, not a "nice to have."

Essentially, the self type pattern requires "abstract" and "concrete" classes. Abstract self-types can be extended. Concrete self-types cannot be extended. Imagine trying to use Java or any other OO language where you were only allowed to extend abstract classes, and not concrete ones. Covariant return does not solve this problem.

It seems that lots of people keep running up against this problem, and some people keep dancing around it, almost hoping to make it go away. The problem is real, and the self-type pattern is a bandaid at best. I feel that direct support for self-type is a necessary Java enhancement for reusable generic code. If someone has a complete solution to the self-type problem using the current state of generics I would appreciate hearing about it. In the meantime, the workaround I use, which I call the raw-type pattern (which is kind of ugly but it works) is described in the bug entry.

Posted by Jon Barrilleaux on December 11, 2006 at 09:13 PM PST #

I think I am struggling with something similar. Unfortunately I don't really understand what Laird Nelson's classes are supposed to do and if your solutions could help me with my struggle. You seem to know a lot about generics, I hope you could help me with this? I am trying to make some classes for representing a graph structure. I have a Node class that refers to a list of other nodes that I put in a LinkedList, additionally each Node has an Integer Id. The graph is simply a Map<Integer, Node> mapping each Id to the corresponding node. So far so good but I am trying to extend the Node class to a DFSNode which keeps track of extra information (color etc...) for performing a depth-first search on the graph. This DFSNode should have children that are also of the type DFSNode... In addition, I have a GraphUtil class that performs graph operations like reversing the arrows in the graph. I want this to work on every subtype of Node (struggling with the cloning thing here too), but I want my DFS search method to only work on DFSNodes... I had all my stuff working but I still needed to cast children of DFSNode from Node to DFSNode. My forum post is at http://forum.java.sun.com/thread.jspa?messageID=9409057�

Posted by Strider80 on December 21, 2006 at 11:22 AM PST #

Please read Object Oriented Software Construction (Bertrand Meyer). Best Reagards

Posted by asd on December 28, 2006 at 04:47 AM PST #

Java is a trademark of Sun Microsystems, Inc.
Copyright © 2006,2007 Peter von der Ahé