Search

Categories

Links

Referers

Coding Conventions and Attribution

Mar 12 2007, 08:45:28 PM PDT »Java»Best Practices Comments [18]

Open sourcing of javac (and the JDK) is an opportunity to revisit our current practices and think about what is missing, what needs to be updated, and what should stay the same. One issue is how to attribute source code with @author tags and there are different policies in use in the open source community.

The position of the Apache Software Foundation (ASF) was explained by the then president of the ASF. To summarize, the ASF recommends against putting @author tags in source code for these reasons:

  • Attributions in source files are not up to date.
  • Software is a team effort.
  • ASF sells the code base as an ASF brand.
  • Shield against law-suits.

However, those concerns are not that convincing to me and are far out-weighed by the need for programmers to take responsibility for the code they write. In Pragmatic Programmer: from journeyman to master, Hunt and Thomas recommend that programmers sign their work, tip 70. They further say:

We want to see pride of ownership. I wrote this, and I stand behind my work.

The former president of ASF has a good point that @author tags should be kept up to date. I agree and think the solution is to ensure they are kept up to date with a clear policy and code reviews. For example, if you create a new class or make significant changes to its API add an @author tag. If you make a minor change make sure your name is in the project's contributor file but do not add an @author tag.

Another point made by the ASF is that software is created by a team and the strength of the brand. I agree that software is created as a team and I think that all the team members deserves to be mentioned in a file that lists all contributors. Such a file could be part of the source bundle and also be prominently linked from the website. I don't think the sense of team effort is harmed by adding @author tags to source files and I don't see how it harms the brand, after all, the copyright statements remain and prominently identify the organization.

I am not a lawyer and this is not legal advice: it is plausible that there are people in other countries that should be careful about putting their names in source files because they can be sued. However, I do not see how this affects people that does not live in those countries.

I have long followed the Emacs lisp library header conventions as described in the GNU Emacs Lisp Reference Manual. As a practical matter, I think these can be directly applied to Java source code. When the Emacs lisp convention talks about file header, I think top-level class and package comment. In other words, only package and top-level class comments should contain @author tags.

It is possible to argue that we have a unique opportunity to go and clean up our sources right now before they are released to the public. However, I'm not convinced that this opportunity has not already passed. Some parts of the JDK were open sourced even before javac and HotSpot was opened in November and the public API has been part of the JDK (src.zip) for many releases. Similarly, all the sources of the JDK have been available for download under JRL since late 2004. There are people that are very sensitive to having their @author tags removed, for example, consider how the developer exa felt after he handed over a project and discovered that the new maintainers had removed his name from the source code.

Although we have not consistently used @author tags in the JDK, or even javac, I think we should keep the ones we have already and develop a consistent policy (for example inspired by the Emacs lisp conventions). It may be a good idea to hide email addresses by obfuscating existing @author tags if they contain email addresses.

Thanks to Jonathan Gibbons, Joe Darcy, and Alex Buckley for their suggestions on this text.
Post a Comment:
Comments are closed for this entry.
Comments:

One popular argument against adding @author tag is to encourage collective code ownership, so contributors or other team members won't be expecting author's to fix the issue. But I don't think it really justify to not having author names in the sources.

Posted by Eugene Kuleshov on March 12, 2007 at 09:17 PM PDT #

I just investigated this same issue and came to the same conclusion. Apparently some people try to reformat the source and then add and @author for themselves, but that isn't a reason to do away with the tags altogether. I like signing my work.

Posted by Bob Lee on March 12, 2007 at 11:25 PM PDT #

It's good to have an AUTHORS file but it doesn't give any insight in who did what and it's not always easy to figure out where the project's source repositories can currently be found (especially if it's an older project) or the attribution is not easily traced to an actual person. In the past I've been able to track people down just by the author tags (preferably not an alias) in source code so for me personally it's proof that they are actually useful :-)

Posted by quintesse on March 13, 2007 at 02:13 AM PDT #

I'm against @author tags. Except in the case where a source file has a single author, @author doesn't really convey enough information to be useful. A GNU-style ChangeLog file is more useful for finding out "who worked on what".

Posted by Dave Gilbert on March 13, 2007 at 03:42 AM PDT #

@Dave: you're AGAINST them? Just because they don't provide enough information or because there is actually something wrong with them? It just seems like a very strong statement without a good reason to back it up.

Posted by quintesse on March 13, 2007 at 05:48 AM PDT #

@quintesse: I'm against @author tags because I think they're a poor way of detailing attribution.(*) GNU-style ChangeLogs do a better job, in my experience. I wouldn't call that a "very strong statement" though, it's not like I said "death to @author tags and all who use them". I think it is a sensible policy to disallow @author tags, but there are vastly more important issues for projects to worry about.

(*) Bob Lee already mentioned an example of someone reformatting the source and then adding an @author for themselves - it happens.

Posted by Dave Gilbert on March 13, 2007 at 06:22 AM PDT #

As a consumer of source code, I really like to know who wrote which piece of code. And the @author tag provides that information handily. It also provide historical context through which the evolution of ideas can be traced. This is important not only the the product that contains the code. It is also important to students of programming (to seek out code written by certain developers), historians of software ideas, designs and implementations, and to computer science in general. So, please, please, please, use @author tags in OpenJDK code. As to the few who abuse their committer status and add their names to many source files, I think history will sort things out.

Posted by Weiqi Gao on March 13, 2007 at 08:07 AM PDT #

In a long lived piece of software like the JDK @author tags are almost always doomed to be misleading. I've seen files almost entirely replaced in content that had an @author tag of the first person who ever edited it and has basically no remaining content apart from that. Many people work on almost every file. Why should one person be *the* author? Adding additional ones is even more confusing. I don't think @author tags have any place in a big project like the JDK.

-phil.

Posted by Phil Race on March 13, 2007 at 08:23 AM PDT #

Personally, I don't find the @author tag to be useful in practice. In a code base with any reasonable degree of churn the author tag is almost immediately out of date. I find them to be misleading the majority of the time. I don't use them myself. For attribution, I much prefer to trust my source code management system to remember the details and tell me who to blame (usually me :).

Posted by Alex Miller on March 13, 2007 at 09:24 AM PDT #

You don't always have source control available. I learned a lot by seeking at code with "@author Josh Bloch" and "@author Doug Lea".

Posted by Bob Lee on March 13, 2007 at 09:49 AM PDT #

Author tags serve real needs. They're just not granular enough. Among the things they do: * Give you an idea of who might have insights into the "why" of code * Set your expectations with respect to coding standards (choice of names, indentation style, etc. To the degree they aren't enforced by some external standard, they will default to the programmer's style) Among the things they don't do: * Tell you how much a person did, in general terms. So I might add myself as "contributor" in some cases, without having to pretend to "author" status. Or I might define my role as "maintainer". * Tell you changed, specifically. For that you need the versioning system. (In an ideal world, the versioning system would designate your role by what you did, and take care of the tagging automatically.)

Posted by Eric Armstrong on March 13, 2007 at 11:22 AM PDT #

I don't like @author tags as well. Even though it does feel good when my name is up there, in the sources. :) But in all projects I've seen, there were no clear policy on how to use @author tags. And I think there is only one _really_ clear policy here - just don't include such tags at all. Any other one is hard to define, hard to make it precise, and hard to remember and to follow. For one person, fixing 50 typos in the javadocs _is_ significant modification. For another, only major rewrite of API is the significant modification.

Most of time, the author tags are out of date, and used inconsistently. So, I'm with those who think that the tags should be removed. And that's what we did during the sources cleanup before the open sourcing of our project...

Posted by Vladimir Sizikov on March 13, 2007 at 02:58 PM PDT #

Taking a semirandom nontrivial class from our source base:

CVS history

There is author tag 'tom', meaning 'tzezula' (I happen to know). If you log the (HEAD) modifications in CVS, in reverse chronological order you will see the following authors:

mkleint
mkleint
saubrecht
jglick
jtulach
jrechtacek
jrechtacek
jrechtacek
jrechtacek
jrechtacek
jrechtacek
tzezula
jrechtacek
jrechtacek
phrebejk
jrechtacek
phrebejk
jrechtacek
tzezula
tzezula
phrebejk
ttran
jrechtacek
tzezula
jrechtacek
phrebejk
jrechtacek
phrebejk
jrechtacek
jrechtacek
jrechtacek
jrechtacek
jrechtacek
jrechtacek
jrechtacek
ttran
tzezula
tzezula
tzezula
tzezula
tzezula
tzezula
tzezula
tzezula
tzezula

Which are you going to believe: @author, or CVS? Certainly tzezula wrote version 1.1, and he is involved in maintaining it. But a lot of the code in the file was written by other people who are not credited because they didn't bother to update the @author tag. In some cases their changes were trivial, but some were not, and several trivial changes can add up.

Posted by Jesse Glick on March 15, 2007 at 04:46 PM PDT #

How is that an argument for removing Tom's author tag? If the subsequent authors thought their change was significant enough and wanted to take credit, they should have added an @author tag.

Posted by Bob Lee on March 15, 2007 at 04:49 PM PDT #

Subsequent editors could have added their own @author tag. But they didn't. Perhaps they were focussed on fixing bugs and didn't even look at the file header. It doesn't really matter why, if the result is that the tag is unreliable. I am just illustrating that VCS history can in some cases be a much better source of information. In this example, turning on "cvs annotate" information inside an IDE shows who last touched each line. If JDK sources were available in Mercurial, "hg annotate -u" would be nearly instantaneous since no network connection is needed.

Posted by Jesse Glick on March 15, 2007 at 05:08 PM PDT #

That assumes you checked out the source and aren't looking at the source distributed with the JDK.

Posted by Bob Lee on March 15, 2007 at 05:11 PM PDT #

Using @author seems quite pointless, as it is handled currently. It mainly contains out-of-date information with no further detail on what the author contributed to the source.
In technical and collaborative documentation in general, it is common use to have an up-to-date table of changes usually following a title and copyright page. Each entry states the date, author, and detail of a change. It's not far from what versioning systems provide for source code.
Why not make the IDE optionally take care about it and automatically adding a new @change tag to a class's description section on check in, stating author, date, and description of the change (or lines touched). For open source, this could do much better than @author. The @version tag provides something similar, but only is updated on check in not added for each change. As an IDE usually will fold class comments, it would not harm working with the code by keeping the useful information at hand. As comments will be gone after compilation, they don't harm the classes either.

Posted by Stefan Schulz on March 18, 2007 at 08:23 AM PDT #

Back when this first came up (or at least when I was first aware of it), I defended @author tags as useful as they told you who was behind the design, as opposed to who was behind the numerous changes to the code. I recently removed all my @author tags at Apache though. The main reason was to emphasize that classes were not owned by people, if I have svn access to it then I believe it's open for me to make a change if I believe it's warranted. A lesser reason was because many of the @author tags had an email address beside them and I was tired of getting private emails asking for help. Your point on having a list of the team is pretty common (I think). Maven usage has popularized lists of developers and contributors to a project, and other projects have websites listing the members. On the topic of being sued - I seem to recall the UK being one of the countries. I've no idea if it was likely (and neither do the ASF afaik), but one member in the UK removed his @authors because he felt a new law would make him liable (as a member of the 'company').

Posted by Henri Yandell on March 20, 2007 at 11:26 PM PDT #

Java is a trademark of Sun Microsystems, Inc.
Copyright © 2006,2007 Peter von der Ahé