Open sourcing of javac (and the JDK) is an opportunity to revisit our current practices and think about what is missing, what needs to be updated, and what should stay the same. One issue is how to attribute source code with @author tags and there are different policies in use in the open source community.
The position of the Apache Software Foundation (ASF) was explained by the then president of the ASF. To summarize, the ASF recommends against putting @author tags in source code for these reasons:
sellsthe code base as an ASF brand.
However, those concerns are not that convincing to me and are far out-weighed by the need for programmers to take responsibility for the code they write. In Pragmatic Programmer: from journeyman to master, Hunt and Thomas recommend that programmers sign their work, tip 70. They further say:
We want to see pride of ownership.I wrote this, and I stand behind my work.
The former president of ASF has a good point that @author tags should be kept up to date. I agree and think the solution is to ensure they are kept up to date with a clear policy and code reviews. For example, if you create a new class or make significant changes to its API add an @author tag. If you make a minor change make sure your name is in the project's contributor file but do not add an @author tag.
Another point made by the ASF is that software is created by a team and the strength of the brand. I agree that software is created as a team and I think that all the team members deserves to be mentioned in a file that lists all contributors. Such a file could be part of the source bundle and also be prominently linked from the website. I don't think the sense of team effort is harmed by adding @author tags to source files and I don't see how it harms the brand, after all, the copyright statements remain and prominently identify the organization.
I am not a lawyer and this is not legal advice: it is plausible that there are people in other countries that should be careful about putting their names in source files because they can be sued. However, I do not see how this affects people that does not live in those countries.
I have long followed the Emacs lisp library header conventions as described in the GNU Emacs Lisp Reference Manual. As a practical matter, I think these can be directly applied to Java source code. When the Emacs lisp convention talks about file header, I think top-level class and package comment. In other words, only package and top-level class comments should contain @author tags.
It is possible to argue that we have a unique opportunity to go and clean up our sources right now before they are released to the public. However, I'm not convinced that this opportunity has not already passed. Some parts of the JDK were open sourced even before javac and HotSpot was opened in November and the public API has been part of the JDK (src.zip) for many releases. Similarly, all the sources of the JDK have been available for download under JRL since late 2004. There are people that are very sensitive to having their @author tags removed, for example, consider how the developer exa felt after he handed over a project and discovered that the new maintainers had removed his name from the source code.
Although we have not consistently used @author tags in the JDK, or even javac, I think we should keep the ones we have already and develop a consistent policy (for example inspired by the Emacs lisp conventions). It may be a good idea to hide email addresses by obfuscating existing @author tags if they contain email addresses.
Thanks to Jonathan Gibbons, Joe Darcy, and Alex Buckley for their suggestions on this text.Neal directed my attention to Laird Nelson's struggle with generics.
Many people have found themselves in a similar situation and I can certainly empathize. Generally speaking, the problem is the need to refer to the type of the current class. This is called self-types or the type of this. But I am getting ahead of myself, let us first examine if there are ways to solve the problem in the current language.
Software development starts with the design phase. This is when you design your software in a small group by the whiteboard and draw informal diagrams. The boxes you draw on the whiteboard represents concepts in the application domain and will eventually result in classes and interfaces. You may also draw lines between concepts that are related. At this time you are not too concerned about how to implement the behavior but how the concepts are related.
The design phase is the ideal time to decide what type variables you need. In most cases you will need a type parameter to when a class aggregates (behaves as a collection of) other objects of varying types depending on usage. For example, consider event handlers: if you design a general event handler that can handle all sorts of events this event handler does not need to be parameterized with the type of event it handles. On the other hand, if you have multiple kinds of event handlers, one for mouse events, one for keyboard events, and one for timer events, then it may make sense to have a single event handler class that is parameterized with the event type.
Ignore how to implement the classes. If a type variable does not make sense on the whiteboard, it does not belong in your code.
It does not matter if you are new to generic types, a brilliant type theoretician, or a recovering C++ template meta-programming addict: it is too easy to use type variable for things they are not suited for!
This lesson is particularly hard to learn if you are used to C++ templates. Generic types are not templates and you should not use type parameters for implementation convenience.
Laird Nelson describes a scenario with objects, references, and adapters. Objects can be canonically identified and references can be dereferenced. I do not know why references are needed and I would personally go for something simpler. However, I will assume that there are good reasons for having all the classes and interfaces described by Laird Nelson but note that I do not fully understand the purpose of all of them.
The goal is to be able to reference objects type-safely as illustrated by this example:
Person p = null; Reference</* what goes here? */> ref = p.getCanonicalReference(); Person p2 = ref.dereference();
So I put all the above theory to the test and started drawing
Laird Nelson's example on my whiteboard. Types like
Dereferenceable and Reference are naturally
parameterized to specify the type of what they
reference
. Similarly, a BaseObjectAdapter can be
parameterized over what it adapts. I am not so sure about
CanonicallyIdentified so I chose not to parameterize it. It
is easy to do so if it makes sense, though. Since I do not know
anything about BaseObject I do not see a need to add any
parameters. Certainly neither a Party nor a Person
are parameterized.
So after the design phase, it could look like this:
class DereferenceException extends Exception {} interface Dereferenceable<T extends BaseObject> {} class Reference<T extends BaseObject> implements Dereferenceable<T> {} interface CanonicallyIdentified {} interface BaseObject extends CanonicallyIdentified {} interface Party extends BaseObject {} interface Person extends Party {} class BaseObjectAdapter<T extends BaseObject> implements BaseObject {}
Compare to where Laird Nelson gave up:
class DereferenceException extends Exception {} interface Dereferenceable<T extends BaseObject> {} class Reference<T extends BaseObject> implements Dereferenceable<T> {} interface CanonicallyIdentified {} interface BaseObject extends CanonicallyIdentified {} interface Party extends BaseObject {} interface Person extends Party {} class BaseObjectAdapter<T extends BaseObject> implements BaseObject {} My suggestion |
class DereferenceException extends Exception {} interface Dereferenceable<T extends BaseObject<T>> {} class Reference<T extends BaseObject<T>> implements Dereferenceable<T> {} interface CanonicallyIdentified /* crap. */ {} interface BaseObject<T extends BaseObject<T>> extends CanonicallyIdentified<T> {} interface Party<T extends BaseObject<T>> extends BaseObject<T> {} interface Person<T extends Person<T>> extends Party<T> {} class BaseObjectAdapter<T extends BaseObject<T>> implements BaseObject<T> {} Laird Nelson's original example after generics |
During the design phase implementation details are not too
important. However, once the design is mature we do need to worry
about how to implement it? There are a few tricks to learn
but most of it is straightforward, at least
Dereferenceable, and Reference are:
interface Dereferenceable<T extends BaseObject> { T dereference() throws DereferenceException; } class Reference<T extends BaseObject> implements Dereferenceable<T> { public T dereference() throws DereferenceException { return null; // or something } }
CanonicallyIdentified may not seem as straightforward and
what about Laird Nelson's example:
Person p = null; Reference</* what goes here? */> ref = p.getCanonicalReference(); Person p2 = ref.dereference();
So clearly, I must parameterize CanonicallyIdentified?
No. I was unsure at the whiteboard and decided not to add
parameterize CanonicallyIdentified and it is not used in
the example anyway. I will use a wildcard:
interface CanonicallyIdentified { Reference<?> getCanonicalReference(); }
On the other hand, had I decided (at the whiteboard) that it did
make sense to parameterize CanonicallyIdentified it could
look like this:
interface CanonicallyIdentified<T> { Reference<? extends T> getCanonicalReference(); }
This decision is based on the the design of the object
hierarchies, not on where types flow. BaseObject looks like
this:
interface BaseObject extends CanonicallyIdentified {}
If I chose to parameterize CanonicallyIdentified, it would
look like this:
interface BaseObject extends CanonicallyIdentified<BaseObject> {}
BaseObject is not parameterized and should not be.
Party and Person are not parameterized either but we
do want to know the type of the canonical reference. The answer is
to override the method and specialize the return type
(covariant return types):
interface Party extends BaseObject { String getSortName(); // or whatever Reference<? extends Party> getCanonicalReference(); } interface Person extends Party { Reference<? extends Person> getCanonicalReference(); }
This also answers the question about what type argument to use in the example above:
Person p = null; Reference<? extends Person> ref = p.getCanonicalReference(); Person p2 = ref.dereference();
Finally, BaseObjectAdapter:
class BaseObjectAdapter<T extends BaseObject> implements BaseObject { /* various instance fields... */ private Reference<T> canonicalReference; // with the usual getters and setters public Reference<T> getCanonicalReference() { return canonicalReference; } }
Is there a lesson to learn from this? Generic types are not
C++ templates. Design your types to have the type
parameters they naturally need and do not add unnecessary type
parameters to save yourself from typing.
Joe
and I have been talking about best practices when using
generics for software design at JavaOne 2005 and 2006. We recommend that you try
to avoid unnecessary type variables.
Sometimes
the solution is to not use generics.
Type parameters on generic methods are different. However,
not too much. Never use type parameters on public methods if
they only benefit the implementation. Instead use a wildcard
and a private generic method if you need to name a type when
implementing the behavior. For example, consider how to implement
Collections.reverse:
public static void reverse(List<?> list) { reverse0(list); } private static <T> void reverse0(List<T> list) { ListIterator<T> fwd = list.listIterator(); ListIterator<T> rev = list.listIterator(list.size()); for (int i=0, mid=list.size()>>1; i<mid; i++) { T tmp = fwd.next(); fwd.set(rev.previous()); rev.set(tmp); } }
What about self-types? The Java™ programming language does not have self-types
right now.
Self-types
would allow you to refer to the type of the receiver (the type
of this, the current class). Imagine we used
this to denote a self-type and we could avoid
overriding getCanonicalReference:
interface CanonicallyIdentified { Reference<? extends this> getCanonicalReference(); } interface Party extends BaseObject { String getSortName(); // or whatever } interface Person extends Party {}
Compare this to when I used covariant return types:
interface CanonicallyIdentified { Reference<? extends this> getCanonicalReference(); } interface Party extends BaseObject { String getSortName(); // or whatever } interface Person extends Party {} With self-types |
interface CanonicallyIdentified { Reference<?> getCanonicalReference(); } interface Party extends BaseObject { String getSortName(); // or whatever Reference<? extends Party> getCanonicalReference(); } interface Person extends Party { Reference<? extends Person> getCanonicalReference(); } Without self-types |
The Strongtalk type system for Smalltalk
has self-types. You can download the Strongtalk system from
www.strongtalk.org.
The
LOOJ paper by Bruce and Foster
includes a proposal for adding ThisClass to the Java
programming language.
If self-types were added to the Java programming language, it
would be obvious to consider retrofitting this onto
Object.clone():
protected native this clone() throws CloneNotSupportedException;
Unfortunately, this is not possible because the specification of that method does not require:
x.clone().getClass() == x.getClass()
It is only recommended and such a change could then break existing programs that follow the specification. Although we sometimes have to break source compatibility, breaking programs that follow the specification is not a viable option. In the situation where new API is defined or the subtypes of a class are controlled, it is possible to take advantage of self-types on the clone method:
class NewClass implements Cloneable {
protected this clone() {
return (this)super.clone(); // cast required as we cannot retrofit Object.clone()
}
}
Since it is already possible to use covariant return types to
simulate this behavior today and we cannot retrofit
Object.clone() I consider it unlikely that we will add
self-types to the Java programming language anytime soon.
Joseph D. Darcy and Alex Buckley provided me with a lot of useful feedback on the early drafts and helped me get the flow better.
On a related note, don't use ArrayList for declarations. Always "program to the interface", you should only use the Collection implementations for constructing new collections. For method and variable declarations, use the interfaces. For example, prefer:
List<String> l = new ArrayList<String>();
to
ArrayList<String> l = new ArrayList<String>();
This simple advice will allow you to easily change implementation if, for example, you notice that LinkedList has better performance for you application.
UPDATE: Robert Konigsberg writes: At the same time, prefer Collection to List and Iterable to Collection, although for a different reason.
Which is also good advice. If you only need to iterate over a list or collection, why require List or Collection?