Joseph D. Darcy's Sun Weblog

Joseph D. Darcy's Sun Weblog


20090527 Wednesday May 27, 2009

Project Coin: For further consideration, round 2

The first group of proposals selected for further consideration were:

After due deliberation, and next set of proposals meeting the Project Coin criteria for further consideration are:

All the selected proposals were reviewed and judged to have favorable effort to reward ratios and to preserve the essential character of the language.

Work should continue refining the selected proposals and producing prototypes. In particular, a unified proposal for integer literals should be produced.

Language change proposals not on the combined "for further consideration" list will not be included in JDK 7; there is no need for continued discussion about them on the Project Coin mailing list. Detailed rationales for why particular proposals were not selected will not be provided.

Final selection of the five or so proposals to be included in the platform will occur within the next few months.

(2009-05-27 20:43:57.0) Permalink Comments [9]

20090331 Tuesday March 31, 2009

Project Coin: The Call for Proposals Phase is Over!

Update: Added links to proposal for switch for all types and simple expressions and a link to a revised method chaining proposal.

Project Coin's call for proposals phase is now over! Thirty four days long, the proposal period included nearly 70 proposals being sent to the mailing list, 19 coming in over the last two days, and over 1100 messages on the list discussing those proposals and related topics. With the flurry of pre-deadline activity over, the more deliberative task of finishing reviewing and evaluating the proposals awaits. Including several sent in a few hours after deadline, the proposals received since week four are:

The figure below graphs when proposals were received; nothing like an impending deadline to focus the mind!

Project Coin Proposal Submissions

Sometime after the next for further consideration cut is made, I'll post some thoughts on and reaction to the call for proposals phase as a whole. This will not include a detailed analysis of why each proposal was or was not chosen; however, there will be discussion of common aspects of proposals that led them to be selected or not.

(2009-03-31 17:26:42.0) Permalink Comments [2]

20090327 Friday March 27, 2009

Project Coin: Week 4 Update

Update: Corrected to include Stephen Colebourne's enhanced enhanced for loop proposal.

Further update: Added links to updated large arrays and compile time access proposals.

Project Coin's fourth week saw continued lively traffic on the mailing list. As the submission deadline approaches, a flurry of new proposals were sent in:

The field of over two dozen proposals previously sent in over the first three weeks of Project Coin was narrowed to six proposals still in consideration for inclusion in JDK 7. The proposals submitted this week and until the end of the call for proposals period will be similarly evaluated for their appropriateness to be added to the language. Finally, the combined list of candidate changes will be produced.

(2009-03-27 17:36:11.0) Permalink Comments [6]

20090324 Tuesday March 24, 2009

A trail of coins

For those interested in specifically following Project Coin related posts, my Project Coin entries are tagged with "projectcoin":
http://blogs.sun.com/main/tags/projectcoin

(2009-03-24 15:40:20.0) Permalink

Project Coin: For further consideration...

In the first three weeks of Project Coin over two dozen proposals have been sent to the mailing list for evaluation. The proposals have ranged the gamut from new kinds of expressions, to new statements forms, to improved generics support. Thanks to all Java community members who have sent in interesting, thoughtful proposals and contributed to informative discussions on the list!

While there is a bit less than a week left in the call for proposals period, there has been enough discussion on the list to winnow the slate of proposals sent in so far to those that merit further consideration for possible inclusion in the platform.

First, Bruce Chapman's proposal to extend the scope of imports to include package annotations will be implemented under JLS maintenance so further action is unnecessary on this matter as part of Project Coin. Second, since the JSR 294 expert group is discussing adding a module level of accessibility to the language, the decision of whether or not to include Adrian Kuhn's proposal of letting "package" explicitly name the default accessibility level will be deferred to that body. Working with Alex, I reviewed the remaining proposals. Sun believes that the following proposals are small enough, have favorable estimated reward to effort ratios, and advance the stated criteria of making things programmers do everyday easier or supporting platform changes in JDK 7:

As this is just an initial cut and the proposals are not yet in a form suitable for direct inclusion in the JLS, work should continue to refine these proposed specifications and preferably also to produce prototype implementations to allow a more thorough evaluation of the utility and scope of the changes. The email list should focus on improving the selected proposals and on getting any remaining new proposals submitted; continued discussion of the other proposals is discouraged.

The final list of small language changes will be determined after the call for proposals is over so proposals sent in this week are certainly still in the running! The final list will only have around five items so it is possible not all the changes above will be on the eventual final list.

(2009-03-24 15:21:00.0) Permalink Comments [8]

20090320 Friday March 20, 2009

Project Coin: Week 3 Update

Project Coin's third week was another week of lively traffic on the mailing list. New proposals were sent in:

existing proposals were revised:

and discussion continued on ARM and other proposals. The scoping and utility of a few pre-proposals was discussed on the list too.

Ten days remain to get language change proposals in! (Purely libraries changes will be handled by other JDK 7 processes.)

(2009-03-20 10:37:34.0) Permalink Comments [2]

20090313 Friday March 13, 2009

Project Coin: Week 2 Update

After the vigorous start of week 1, the pace of new proposals being sent to the list slowed:

However, brisk discussion continued on refining and exploring ARM blocks and their variations.

(2009-03-13 09:30:00.0) Permalink Comments [1]

20090312 Thursday March 12, 2009

Design tips: Exception types

Expanding on a few slides from my JavaOne talk last year, here are a few tips to keep in mind when designing exception types.

First, all exceptions are serializable since Throwable implements Serializable; therefore, like all other serializable classes, exception types should declare a serialVersionUID field to ease evolving the type in the future. Using's javac's -Xlint:serial option will warn about missing serialVersionUID fields on serializable classes, amongst other possible issues.

Second, when adding a new exception class, consider providing more information beyond just a distinct name, such as methods to return information about what specific situation triggered the exception and possibly how to recover from it. Providing this additional information can interact with being serializable; when the additional information is not logically serializable, the specification may need to allow the information to be unavailable after deserialization.

Third, when multiple related exceptions types are added, a common direct superclass allows a single catch block to handle the related exceptions uniformly. (Having a common super-exception would still be useful even if multi-catch is added to the language in JDK 7.)

Looking at the exceptions in the JSR 269 API, various methods note the possible impact of serialization-deserialization on the returned values. JSR 269 provided a trio of similar exceptions for the situation of encountering a kind of object unknown in an earlier version of the language, such as a JDK 6 era annotation processor coming across a module from JDK 7:

However, the original JSR 269 API does not have a common direct superclass to group these related conditions. That deficiency was addressed in JDK 7 build 48 with the addition of javax.lang.model.UnknownEntityException as the direct superclass of these three exceptions (6794071). Retrofitting this change is binary compatible because UnknownEntityException directly extends RuntimeException as did the old exceptions and serialization compatibility is preserved by the existing serialVersionUID fields in the old exceptions.

(2009-03-12 12:00:07.0) Permalink

20090310 Tuesday March 10, 2009

All about JDK 7

Those interested in following JDK 7 happenings from Sun engineers can track "jdk7" tagged entries on http://blogs.sun.com/main/tags/jdk7.

(2009-03-10 11:27:20.0) Permalink Comments [1]

20090308 Sunday March 08, 2009

The crested butte of Crested Butte

Last week I was off attending my first Java Posse Roundup in scenic Crested Butte Colorado, pictured below. There were many good discussions related to JDK 7 and other programming, and non-programming, topics.

Unfortunately, there were some difficulties with my flight back. Once again, the plane I was flying on had to be rebooted, but at least this time the passengers didn't need to be reinstalled! I missed my scheduled connection in Denver and caught the next flight to the bay area a few hours later. The wait in Denver was made more pleasant by the airport's free-after-a-short-ad wi-fi and recharging stations for electronic gear. More airports should have those amenities!

(2009-03-08 22:15:09.0) Permalink

20090306 Friday March 06, 2009

Project Coin: Week 1 Update

In its first week, Project Coin enjoyed a vigorous start with well over a dozen proposals submitted:

Traffic on the the list has been high, with lots of feedback and analysis leading to some revised proposals.

A few general comments on the proposals that have been sent in so far to help refine those proposals and improve future proposals before they are sent in.

The proposals submitted to Project Coin should already be well thought-through. The goal is to have in short order specifications approaching JLS quality, preferably with a prototype to help validate the design. The feedback on the list should be much closer to finding and illuminating any remaining dark corners of a proposal rather than fleshing out its basic structure. If a proposal does not cite chapter and verse of the JLS for parts of its specification, that is a good indication the proposal is too vague. All affected sections of the JLS should be listed, including binary compatibility and the flow analysis in definite assignment.

It is fine if someone posts to the list to solicit help writing a proposal for a given change.

Proposal writers should be aware of the size and scope parameters established for the project; for background see:

Also, proposal writers should search Sun's bug database for bugs related to the change. The URL for the database is http://bugs.sun.com; Java specification issues are in category Java SE and subcategory specification. Of course the database is also searchable with your favorite search engine restricted to that site. Besides the evaluation field from the bug database, the external comment can often also have valuable insight into and discussion of alternatives to solving the problem or reasons why the problem shouldn't be solved.

As has already been happening on the list, authors and advocates of a proposal are responsible for responding to feedback and incorporating changes into any subsequent iterations of the proposal. For now, I think it is adequate to just send the revised proposals to the list. Only if there turns out to be frequent change would a more formal tracking system be warranted. Keeping such discussions on the list is important both to allow easy, centralized tracking of the proposal drafts and also for future language archaeologists who are curious about why a particular decision was made.

After a few iterations of feedback and refinements, the specification and compilation strategy should be sufficiently detailed to provide high-confidence that the proposal is practical and can be reduced to practice. For example, I think the initial proposal for the admittedly simple strings in switch change provides adequate detail on these fronts.

(2009-03-06 09:00:00.0) Permalink Comments [15]

20090301 Sunday March 01, 2009

Project Coin: Proposal for Strings in switch

Below is a Project Coin language proposal form I wrote for Strings in switch; send any comment to the Project Coin mailing list.


PROJECT COIN SMALL LANGUAGE CHANGE PROPOSAL FORM v1.0

AUTHOR(S): Joseph D. Darcy

OVERVIEW
Provide a two sentence or shorter description of these five aspects of the feature:
FEATURE SUMMARY: Should be suitable as a summary in a language tutorial.

Add the ability to switch on string values analogous to the existing ability to switch on values of the primitive types.

MAJOR ADVANTAGE: What makes the proposal a favorable change?

More regular coding patterns can be used for operations selected on the basis of a set of constant string values; the meaning of the new construct should be obvious to Java developers.

MAJOR BENEFIT: Why is the platform better if the proposal is adopted?

Potentially better performance for string-based dispatch code.

MAJOR DISADVANTAGE: There is always a cost.

Some increased implementation and testing complexity for the compiler.

ALTERNATIVES: Can the benefits and advantages be had some way without a language change?

No; chained if-then-else tests for string equality are potentially expensive and introducing an enum for its switchable constants, one per string value of interest, would add another type to a program without good cause.

EXAMPLES
Show us the code!
SIMPLE EXAMPLE: Show the simplest possible program utilizing the new feature.

String s = ...
switch(s) {
 case "foo":
   processFoo(s);
   break;
}

ADVANCED EXAMPLE: Show advanced usage(s) of the feature.

String s = ...
switch(s) {
case "quux":
   processQuux(s);
   // fall-through

 case "foo":
 case "bar":
   processFooOrBar(s);
   break;

 case "baz":
    processBaz(s);
   // fall-through

 default:
   processDefault(s);
   break;
}

DETAILS
SPECIFICATION: Describe how the proposal affects the grammar, type system, and meaning of expressions and statements in the Java Programming Language as well as any other known impacts.

The lexical grammar is unchanged. String is added to the set of types valid for a switch statement in JLSv3 section 14.11. Since Strings are already included in the definition of constant expressions, JLSv3 section 15.28, the SwitchLabel production does not need to be augmented. The existing restrictions in 14.11 on no duplicate labels, at most one default, no null labels, etc. all apply to Strings as well. The type system is unchanged. The definite assignment analysis of switch statement, JLSv3 section 16.2.9, is unchanged as well.

COMPILATION: How would the feature be compiled to class files? Show how the simple and advanced examples would be compiled. Compilation can be expressed as at least one of a desugaring to existing source constructs and a translation down to bytecode. If a new bytecode is used or the semantics of an existing bytecode are changed, describe those changes, including how they impact verification. Also discuss any new class file attributes that are introduced. Note that there are many downstream tools that consume class files and that they may to be updated to support the proposal!

One way to support this change would be to augment the JVM's lookupswitch instruction to operate on String values; however, that approach is not recommended or necessary. It would be possible to translate the switches to equivalent if-then-else code, but that would require unnecessary equality comparisons which are potentially expensive. Instead, a switch should occur on a predictable and fast integer (or long) function value computed from the string. The most natural choice for this function is String.hashCode, but other functions could also be used either alone or in conjunction with hashCode. (The specification of String.hashCode is assumed to be stable at this point.) If all the string labels have different lengths, String.length() could be used instead of hashCode. Generally a String.equals() check will be needed to verify the candidate string's identity in addition to the evaluation of the screening function because multiple string inputs could evaluate to the same result.

A single case label, a single case label with a default, and two case labels can be special-cased to just equality checks without function evaluations. If there are collisions in String.hashCode on the set of case labels in a switch block, a different function without collisions on that set of inputs should be used; for example ((long)s.hashCode<<32 ) + s.length()) is another candidate function.

Here are desugarings to currently legal Java source for the two examples above where the default hash code do not collide:

// Simple Example
if (s.equals("foo")) { // cause NPE if s is null
 processFoo(s);
}


// Advanced example
{  // new scope for synthetic variables
 boolean $take_default = false;
 boolean $fallthrough = false;
 $default_label: {
     switch(s.hashCode()) { // cause NPE if s is null
     case 3482567: // "quux".hashCode()
         if (!s.equals("quux")) {
             $take_default = true;
             break $default_label;
         }
         processQuux(s);
         $fallthrough = true;
               case 101574: // "foo".hashCode()
         if (!$fallthrough && !s.equals("foo")) {
             $take_default = true;
             break $default_label;
         }
         $fallthrough = true;
     case 97299:  // "bar".hashCode()
         if (!$fallthrough && !s.equals("bar")) {
             $take_default = true;
             break $default_label;
         }
         processFooOrBar(s);
         break;

     case 97307: // "baz".hashCode()
         if (!s.equals("baz")) {
             $take_default = true;
             break $default_label;
         }
         processBaz(s);
         $fallthrough = true;

     default:
         $take_default = true;
         break $default_label;
     }
 }
 if($take_default)
     processDefault(s);
} 

In the advanced example, the boolean "fallthrough" variable is needed to track whether a fall-through has occurred so the string equality checks can be skipped. If there are no fall-throughs, this variable can be removed. Likewise, if there is no default label in the original code, the $take_default variable is not needed and a simple break can be used instead.

In a translation directly to bytecode, the synthetic state variables can be replaced with goto's; expressing this in pseudo Java source with goto:

// Advanced example in pseudo Java with goto
switch(s.hashCode()) { // cause NPE if s is null
case 3482567: // "quux".hashCode()
   if (!s.equals("quux"))
       goto $default_label;
   goto $fooOrBarCode_label;
  case 101574: // "foo".hashCode()
   if (!s.equals("foo"))
       goto $default_label;
   goto $fooOrBarCode_label;

case 97299:  // "bar".hashCode()
   if (!s.equals("bar"))
       goto $default_label;

   $fooOrBarCode_label:
   processFooOrBar(s);
   break;

case 97307: // "baz".hashCode()
   if (!s.equals("baz"))
       goto $default_label;
   processBaz(s);

default:
$default_label:
   processDefault(s);
   break;
} 

Related to compilation, a compiler's existing diagnostics around falling through switches, such as javac's -Xlint:fallthrough option and @SuppressWarnings("fallthrough"), should work identically on switch statements based on Strings.

TESTING: How can the feature be tested?

Generating various simple and complex uses of the new structure and verifying the proper execution paths occur; combinations to test include switch statements with and without fall-throughs, with and without collisions in the hash codes, and with and without default labels.

LIBRARY SUPPORT: Are any supporting libraries needed for the feature?

No.

REFLECTIVE APIS: Do any of the various and sundry reflection APIs need to be updated? This list of reflective APIs includes but is not limited to core reflection (java.lang.Class and java.lang.reflect.*), javax.lang.model.*, the doclet API, and JPDA.

Only reflective APIs that model statements in the source language might be affected. None of core reflection, javax.lang.model.*, the doclet API, and JDPA model statements; therefore, they are unaffected. The tree API in javac, does model statements, but the existing API for switch statements is general enough to model the revised language without any API changes.

OTHER CHANGES: Do any other parts of the platform need be updated too? Possibilities include but are not limited to JNI, serialization, and output of the javadoc tool.

No.

MIGRATION: Sketch how a code base could be converted, manually or automatically, to use the new feature.

Look for sequences of if ("constant string".equals(foo)) or if (foo.equals("constant string")) and replace accordingly.

COMPATIBILITY
BREAKING CHANGES: Are any previously valid programs now invalid? If so, list one.

All existing programs remain valid.

EXISTING PROGRAMS: How do source and class files of earlier platform versions interact with the feature? Can any new overloadings occur? Can any new overriding occur?

The semantics of existing class files and legal source files and are unchanged by this feature.

REFERENCES
EXISTING BUGS: Please include a list of any existing Sun bug ids related to this proposal.

5012262 Using Strings and Objects in Switch case statements.

URL FOR PROTOTYPE (optional):

No prototype at this time.

(2009-03-01 10:00:00.0) Permalink Comments [7]

20090227 Friday February 27, 2009

Project Coin Now Live

The Project Coin OpenJDK page and mailing list are now live. The call for proposal period will run until March 30, 2009. Let the proposing begin!

(2009-02-27 15:17:18.0) Permalink Comments [2]

20090127 Tuesday January 27, 2009

Project Coin: Small Language Change Proposal Form Available

The name of the OpenJDK project hosting small language changes for JDK 7 will be Project Coin. Besides a coin literally being small change, to "coin a phrase" is to create a little bit of new language.

The website for the project and its mailing lists will come into being this February. In the mean time, the initial form to use to propose a language change is listed below. If you have an idea for a change, please work on the form and post it the Project Coin mailing list once that gets started.

Small language changes I think would improve the language according to the previously discussed criteria include (related Sun bugs in parentheses):

  • Strings in switch: A simple change that improves regularity in the language (5012262, 4269827).

  • More concise calls to constructors with type parameters: A large pain point when using generics occurs with the need to specify type parameters both on the left hand side of a declaration of a generic variable as well as on the right hand side in a constructor call to initialize the variable (4879776).

  • Exception enhancements: multi-catch and final rethrow: Exception handling would be streamlined if semantically the same code for distinct exceptions could be shared (4432337).

  • Ability to call with methods with exotic names: As the Java platform gets better support for non-Java languages via the Da Vinci Machine project, it is useful to allow Java programs to call methods from foreign languages hosted on the JVM, even if those languages have different naming restrictions than Java (6746458).

  • (Possibly) Bracket notation to access collections: Collections like List and Map are nearly ubiquitous in Java programs and they offer many advantages over arrays, including better integration with the generic type system, availability of growable data structures, and the possibility to use immutability. However, arrays do have their more concise [] notation; allowing lists and maps and similar data structures to use this notation would encourage their use even more widely (4632701, 4877954).


PROJECT COIN SMALL LANGUAGE CHANGE PROPOSAL FORM v1.0

INSTRUCTIONS: For a proposal to be considered, this document must be complete and stand-alone in and of itself. No URLs, citations of papers, etc. can appear except for the limited supplementary information requested in the "REFERENCES" section. A new class file version number can be assumed to be available for -target 7. The proposal must not remove existing features of the language; for example, "Get rid of checked exceptions" would not be considered. As part of being stand-alone, the proposal must not rely on any other language changes that have not already been accepted.

AUTHOR(S): Who are you?

OVERVIEW
Provide a two sentence or shorter description of these five aspects of the feature:
FEATURE SUMMARY: Should be suitable as a summary in a language tutorial.

MAJOR ADVANTAGE: What makes the proposal a favorable change?

MAJOR BENEFIT: Why is the platform better if the proposal is adopted?

MAJOR DISADVANTAGE: There is always a cost.

ALTERNATIVES: Can the benefits and advantages be had some way without a language change?

EXAMPLES
Show us the code!
SIMPLE EXAMPLE: Show the simplest possible program utilizing the new feature.

ADVANCED EXAMPLE: Show advanced usage(s) of the feature.

DETAILS
SPECIFICATION: Describe how the proposal affects the grammar, type system, and meaning of expressions and statements in the Java Programming Language as well as any other known impacts.

COMPILATION: How would the feature be compiled to class files? Show how the simple and advanced examples would be compiled. Compilation can be expressed as at least one of a desugaring to existing source constructs and a translation down to bytecode. If a new bytecode is used or the semantics of an existing bytecode are changed, describe those changes, including how they impact verification. Also discuss any new class file attributes that are introduced. Note that there are many downstream tools that consume class files and that they may to be updated to support the proposal!

TESTING: How can the feature be tested?

LIBRARY SUPPORT: Are any supporting libraries needed for the feature?

REFLECTIVE APIS: Do any of the various and sundry reflection APIs need to be updated? This list of reflective APIs includes but is not limited to core reflection (java.lang.Class and java.lang.reflect.*), javax.lang.model.*, the doclet API, and JPDA.

OTHER CHANGES: Do any other parts of the platform need be updated too? Possibilities include but are not limited to JNI, serialization, and output of the javadoc tool.

MIGRATION: Sketch how a code base could be converted, manually or automatically, to use the new feature.

COMPATIBILITY
BREAKING CHANGES: Are any previously valid programs now invalid? If so, list one.

EXISTING PROGRAMS: How do source and class files of earlier platform versions interact with the feature? Can any new overloadings occur? Can any new overriding occur?

REFERENCES
EXISTING BUGS: Please include a list of any existing Sun bug ids related to this proposal.

URL FOR PROTOTYPE (optional):

(2009-01-27 09:05:00.0) Permalink Comments [42]

20081223 Tuesday December 23, 2008

Criteria for desirable small language changes

The two primary goals of making small language changes in JDK 7 is to:

  1. Make the things programmers do everyday easier.

  2. Support other platform changes in JDK 7.

Over the years, certain common coding patterns have been recognized as needlessly verbose including:

  • if-equals-X-else-if-equals-Y testing chains on strings

  • duplicated catch blocks for different exception types

  • repeated type parameters when declaring and initializing a variable of parameterized type

These patterns can be replaced with new constructs that are more concise and more clear without fundamentally altering the language. Besides improvements to support existing Java programs, language changes should also be made to allow appropriate access to new JVM capabilities, such as those being enabled by the Da Vinci Machine project.

While language changes can fundamentally improve the modes of expression in a language, language changes have a number of drawbacks as solutions to programming problems:

  • Slow availability: Language changes occur in platform releases, which typically only occur every few years.

  • Heavyweight: The full extent of a language changes can affect multiple components of the platform.

  • Changes may be needed at multiple points in the toolchain: Even after a language change is fully available in the JDK, independent libraries and tools may need to be updated as well before the changes can be fully utilized.

Therefore, language changes are rarely the preferred solution if other workable solutions are available. Since IDEs are now commonly used for Java development, mitigating or solving problems using IDE tooling is one possibility. As of Java SE 6, compliant compilers are required to support annotation processing as standardized by JSR 269, see javax.annotation.processing and javax.lang.model. Annotation processing provides a general meta-programming framework; beyond processing annotations directly, annotation processors can be used to implement many currently extra-lingual checks based on a program's structure. Checks which previously would have required language changes can now be implemented by developers and just used by convention. JSR 308, Annotations on Java Types, would enable more detailed checking by allowing annotations in more program locations.

When judging whether or not any change to the platform is worthwhile, a useful notion is estimating the feature's "thrust to weight ratio," that is estimating whether the benefits of making the change exceed the full cost of implementing the change. For language changes, this metric is improved by having a larger fraction of programs potentially benefiting from the change. For example, it would be roughly the same amount of engineering to add numerical operator overloading support for classes like BigInteger and BigDecimal as to add support for bracket, "[]", syntax for Lists and Maps. Besides complications with the == operator in the numerical case, bracket syntax for Maps and Lists has much higher utility since many more Java programs use Collections than large numbers.

Especially with the maturity of the Java platform, the onus is on the proposer to convince that a language change should go in; the onus is not to prove the change should stay out.

Given the upcoming holidays, the language change proposal form and the seeding proposals will both be coming in January 2009.

(2008-12-23 09:00:00.0) Permalink Comments [17]

20081211 Thursday December 11, 2008

Guidance on measuring the size of a language change

Soon a project will be starting to consider adding a to-be-determined set of small language changes to JDK 7. Given the rough timeline for JDK 7 and other on-going efforts to change the language, such as modules and annotations on types, only a limited number of small changes can be considered for JDK 7. That does not imply that larger changes aren't appropriate or worthwhile at some point in the future; in the mean time such changes can be explored and honed for JDK 8 or later.

Separate from its size, criteria to evaluate the utility of a language change will be discussed in a future blog entry.

The JCP process defines three deliverables for a JSR:

  • Specification.

  • Reference Implementation

  • Compatibility Tests

These three distinct aspects of a language change, specification, implementation, and general testing, exist whether or not the change is managed under a JSR. For this project, a language change will be judged small if it is simultaneously a small-enough effort under all three of specification, implementation, and testing. In other words, if a change is medium sized or larger in a single area, it is not a small change. (This corresponds to using an infinity norm to measure size; see "Norms: How to Measure Size".) Another concern is the size of change to developers, but if the change is small in these three areas, it is likely to be small for developers to learn and adopt too. Because there is limited fungiblity between the people working on specification, implementation, and testing, a single oversize component can't necessarily be compensated for by the other two components being small enough to managed on their own.

The size of a specification change is not just related to the amount of text that is altered; it also depends on which text, how many new concepts are needed, and the complexity of those concepts. Similarly, the implementation effort can be large if a limited amount of tricky code is involved as well as if a large volume of prosaic code is needed. An estimate of the future maintenance effort should factor into judging the net implementation cost too. The specification size and implementation size are often not closely related; a small spec change can require large implementation efforts and vice versa. JCK-style conformance testing is based on testing assertions in the specification, so the size of this kind of testing effort should have some positive correlation with the size of the specification change. Likewise, regression testing should have at least a weak positive correlation with the size of the implementation change. However, adequate conformance testing can be disproportionately large compared to the size of the specification change depending on how the assertions interact and how many programs they affect.

Due to complexity of the Java type system and the desire to maintain backwards compatibility, almost any type system change will be at least a medium-sized effort for the implementation, specification, or both. Each new feature of the type system can interact with all the existing features, as well as all the future ones, so type system changes must be approached with healthy skepticism.

As a point of reference, the set of Java SE 5 language features will be sized according to the above criteria; from smallest to largest:

  • Normal maintenance, Size: Tiny
    In the course of maintaining the platform, small changes and corrections are made to the Java Language Specification (JLS) and javac. These changes even take together are not large enough to warrant a JSR separate from the platform umbrella JSR.

  • Hexadecimal floating-point literals, Size: Very small
    Hexadecimal floating-point literals were a small new feature added to the language in JDK 5 under maintenance. Only very localized grammatical changes were needed in the JLS together with well-bounded supporting library methods.

  • for-each loop, Size: Small
    Part of JSR 201, the enhanced for statement required a new section in the JLS and a straightforward desugaring by the compiler. However, there were still complications; calamity was narrowly averted in the new libraries needed to support the for loop. A new java.lang.Iterator type that would have broken migration compatibility was dropped in favor of reusing the less than ideal java.util.Iterator.

  • static import, Size: Small, but more complicated than expected
    Static import added more ways to influence the mapping of simple names in source code to the binary names in class files. The mapping already had complexities, including rules for hiding, shadowing, and obscuring; static import introduced more interactions.

  • enum types, Size: Medium
    By introducing a new kind of type, adding enum types included a type system modification and so were a medium-sized change. While the normative JLS text devoted to enums is brief, JVMS changes were also required, as well as surprising time-consuming and intricate libraries work, including interactions with IIOP serialization.

  • autoboxing and unboxing, Size: Medium
    The complications with autoboxing and unboxing come not from the feature directly, but from its interactions with generics and method resolution.

  • Annotation types, Size: Large
    As an enum was a new kind of specialized class, an annotation type, introduced in JSR 175, were a new kind of specialized interface. Besides being a type change, annotation types required coordinated JVM and library modifications as well as a new tool and framework, and a subsequent standardization, to fulfill the potential of the feature.

  • Generics, Size: Huge
    Generics were a pervasive change to the platform, introducing many new concepts in the specification, considerable change to the compiler, and far-reaching libraries updates.

Some examples of bigger-than-small language changes that have been discussed in the community include:

  • BGGA closures: Independent of the technical merit of the proposal, BGGA closures would be a large change to the language and platform.

  • Properties: While a detailed judgment would have to be made against a specific proposal, as a new kind of type properties would most likely be at least medium-sized.

  • Reification: The addition of information about the type parameters of objects at runtime would involve language changes, nontrivial JVM changes to maintain efficiency, and raise compatibility issues.

Specific small language changes we at Sun are advocating for JDK 7 will be discussed in the near future.

(2008-12-11 09:00:00.0) Permalink Comments [8]

20081208 Monday December 08, 2008

Coming Soon: A JSR for small language changes in JDK 7

I'm happy to announce that I'll be leading up Sun's efforts to develop a set of small language changes in JDK 7; we intend to submit a JSR covering those changes during the first half of 2009. However, before the JSR proposal is drafted and submitted to the JCP, we'll first be running a call for proposals so Java community members can submit detailed, thoughtful changes for consideration too. We'll be seeding the discussion with a few proposals we think would improve the language. More information on our proposed changes, guidance for measuring the size of a change, and criteria for judging the desirability of a language change will be coming over the next several weeks.

I've proposed an OpenJDK project to host the discussion of the proposals and potentially some prototype implementations.

Suggested Reading
So you want to change the Java Programming Language...

(2008-12-08 11:25:14.0) Permalink Comments [50]

20080512 Monday May 12, 2008

API Design: Interfaces versus Abstract Classes

Jake Gittes: Why are you doing it?
How much better can you eat?
What can you buy that you can't already afford?
Noah Cross: The future, Mr. Gitts, the future.
—Chinatown


Quoting, Effective Java, first edition, Item 16: Prefer Interfaces to abstract classes

To summarize, an interface is generally the best way to define a type that permits multiple implementations. An exception to this rule is the case where ease of evolution is deemed more important than flexibility and power.

As discussed in that item, the ease of evolution of abstract classes comes from the ability to add new methods having "reasonable default implementations" without almost surely causing source of all existing subtypes to no longer compile. The flexibility and power of interfaces involve ease of retrofitting to existing classes, allowing nonhierarchical type relations, and so on. An additional benefit of interfaces is the ability to use dynamic proxies; one notable use of dynamic proxies is creating the annotation objects returned at runtime by getAnnotation. One potential difference not worth considering with modern virtual machines is the speed difference between invoking a method on an interface versus invoking a method on a class.

While there is a sound rationale backing the conventional wisdom, in my estimation the compatible evolution advantages of abstract classes are smaller than they appear at first, further tipping the balance in favor of using interfaces in more situations.

The two alternatives to be considered to define the initial desired type abstraction are:

  • Declare an interface.

  • Declare an abstract class, all of whose initial methods are public and abstract.

In neither case are fields being defined. In both cases a skeletal abstract implementation class, like java.util.AbstractList, could be used to share implementation code. If the type abstraction is defined by an abstract class, the skeletal class and abstract class might be able to be combined, saving a type compared to the pair of an interface plus a skeletal class. However, forcing all implementations to be based on the same skeletal class may be awkward. Interfaces can easily have multiple independent skeletal helper classes. Subclasses can blunt inheritance issues by using an intermediate subclass to abstract-ify any problematic implementations from the parent.

Table 1 outlines the different kinds of compatibility impacts, source, binary, and behavioral, from adding a method to an interface and an abstract class. The effects of adding a method to an abstract class depend on whether or not the added method is abstract or has an implementation. For the purposes of discussion, we will assume the method does have an implementation (otherwise, there would be no advantage to using an abstract class).

Table 1 — Compatibility summary of adding a method
Interface Abstract class
Binary compatibility Adding a method to an interface is binary compatible. Note that existing clients will continue to link, but attempted calls to the missing new method will result in an AbstractMethodError. Adding a method to an abstract class is binary compatible.
Source compatibility Adding a method to an interface has the full range from possible impacts, from being binary-preserving source compatible to breaking compilation. Adding a method to an abstract class has the full range from possible impacts, from being binary-preserving source compatible to breaking compilation.
Behavioral compatibility No direct behavioral impact to existing code calling existing methods. No direct behavioral impact for the cases under consideration.

Technically, adding a method to an interface and adding a method to an abstract class are both binary compatible since programs using those types will continue to link. However, in the case of an interface type, if a program calls the new method on an existing implementation of the interface (unless the implementation presciently had a method with a matching signature declared), an AbstractMethodError will be thrown, which is an awkward situation to recover from. Also, for the call to the new interface method to work on an existing implementor of the old interface, the method in the implementor must be an exact match, signature and return type, for the added method; if the return type in the implementor is a subtype of the added method, a covariant return, a recompile of the implementor is needed to create the bridge method joining the method from the interface with the method declared in the class.

Adding a method to an interface has a wide range of possible source compatibility effects on existing code. It is possible that an implementation anticipated future developments and already has a method matching the newly added method. In that case, adding the method is binary-preserving source compatible with that particular class. Of course in general it is much more likely that existing implementations do not already have the new method, in which case they won't compile against the modified interface declaration. Therefore, the worst possible outcome is that existing implementations will stop compiling after the method is added to the interface; this worst case outcome is also the most likely outcome in the absence of other information.

Adding a concrete method to an abstract class also has a range of source compatibility outcomes. If no existing extending class has a method with the new name, there is no conflict and the addition is binary-preserving source compatible given the set of actual programs. If not the expected outcome, this is certainly the hoped for outcome of adding a method to an abstract class! However, it is possible existing subclass already declare a method with the new name. If the parameter types match but the return types conflict, existing subclasses will stop compiling after the method is added. If the parameter types are not the same, an overloading situation is introduced or expanded. This can change method resolution of call sites using the existing subclass, which may or may not lead to behaviorally equivalent class files since different methods might be called. One technique to avoid changing resolution at existing call sites is for the new method to include in its parameter list a new type added at the same time as the method. If the new type is not related to existing types, then no method in an existing subclass will interact with the new method during method resolution. Therefore, the worst possible outcome is that some existing subclasses will stop compiling after the method is added to the abstract class; this can be avoided depending on the parameter list of the new method, at the potential cost of introducing new overloadings that change existing method resolution.

Not counting introspective operations like core reflection, adding methods to an interface or abstract class does not have much direct appreciable behavioral compatibility impact because adding methods doesn't directly affect the code run by existing clients of the class. If an abstract class were not at the conceptual root of a type hierarchy, adding a concrete method could intercept calls to a method with the same signature in the superclass. However, if the children of an abstract superclass already have a concrete implementation for the newly added method, existing calls to the children's method would not be intercepted by the method added in the superclass.

Since adding a method to an interface or an abstract class is binary compatibly and in both cases the worst case source compatibility outcome is breaking compilation of existing subtypes, any evolution advantage of abstract classes hinges on the ability to have a reasonable default implementation for new methods. But what can such a new method implementation really do? Some viable options are:

  • Throw new UnsupportedOperationException or some other exception.

  • Call existing methods on the abstract class.

  • A no-op method.

(Other sorts of behavior could potentially be added to skeletal classes, but those classes aren't an alternative to interfaces.) Adding a default implementation that throws an exception isn't necessarily very useful; throwing AbstractMethodError would mimic adding a method to an interface! If the functionality of the new method can be expressed in terms of existing methods on the abstract class, the new method could also be written as convenience static method in a helper class. In that case, the convenience method could just as easily be written in terms of methods on an interface instead. Proposals for extension methods would add syntactic support for this helper class pattern. A no-op method could be added to optionally advise subclasses to some condition or event, but it would have no useful effect on existing subclasses. While it is straightforward to add simple concrete methods to an abstract class, with sufficient advance planning, such methods could also be automatically added to implementations of an interface at compile time.

Starting in JDK 6, Java compilers must support standardized annotation processing. Annotation processing is a general meta-programming framework not directly tied to annotations. Before annotation processing, the types being compiled can be incomplete, including references to types to be generated during annotation processing. The to-be-generated types can include the superclass of a class being compiled. Supporting the generation of superclasses is a very powerful technique for modifying the semantics of the child class. In this case, a class implementing an interface expected to change in the future could refer to a private superclass. With the original definition of the interface, the superclass would be empty. However, when methods were added to the interface, the annotation processor could generate implementations of those methods in the superclass. This would have the effect of adding the new methods to the class at compile time. Annotations could drive what the synthesized implementation actually did, such as throw an exception or a no-op.

Compared to adding methods to an interface, adding concrete methods to an abstract class seems to be much more compatible. However, both operations are binary compatible, and while adding a method to an abstract class usually has a better "average" impact on existing subtypes, the worst possible impact is the same, breaking the compilation of existing code. As for the functionality that can be added in a concrete method, convenience methods can be put in separate class and the other sorts of limited functionality methods that can readily be added could also be generated via annotation processing for implementors of an interface. Therefore, the practical evolutionary benefits of using an abstract class rather than an interface should be considered carefully since interfaces may still be a better choice when limited evolution is anticipated.

(2008-05-12 18:43:29.0) Permalink Comments [4]

20080421 Monday April 21, 2008

Compatibly Evolving BigDecimal

Back in JDK 5, JSR 13 added true floating-point arithmetic to BigDecimal, which involved many new methods and constructors along with new supporting classes in the java.math package. I was actively involved in the JSR 13 expert group and integrated the code into the JDK. These changes had some surprising compatibility impacts which can be classified according to their source, binary, and behavioral effects.

The numerical values representable in BigDecimal are (unscaledValue × 10-scale) where unscaledValue is a BigInteger and scale is a 32-bit integer. Before Java SE 5, scale was constrained to be positive or zero (in other words, 10 raised to a negative or zero exponent) and JSR 13 removed this restriction to allow any integer exponent. Consequently, prior to JSR 13 BigDecimal integral values with trailing zeros had to have them explicitly represented; for example the value one million had to be stored as (1,000,000 × 100) rather than (1 × 106) or (10 × 105), etc. One behavioral consequence of JSR 13 was that all the methods operating on BigDecimal values understand and accept numbers without the old exponent restriction.

The new API elements added by JSR 13 are listed in table 1; the additions will be examined under each kind of compatibility.

Table 1 — API changes made by JSR 13 to BigDecimal
New Fields public static final ZERO
public static final ONE
public static final TEN
New Constructors public BigDecimal(char[] in, int offset, int len)
public BigDecimal(char[] in, int offset, int len, MathContext mc)
public BigDecimal(char[] in)
public BigDecimal(char[] in, MathContext mc)
public BigDecimal(String val, MathContext mc)
public BigDecimal(double val, MathContext mc)
public BigDecimal(BigInteger val, MathContext mc)
public BigDecimal(BigInteger unscaledVal, int scale, MathContext mc)
public BigDecimal(int val)
public BigDecimal(int val, MathContext mc)
public BigDecimal(long val)
public BigDecimal(long val, MathContext mc)
New Methods public static BigDecimal valueOf(double val)
public BigDecimal add(BigDecimal augend, MathContext mc)
public BigDecimal subtract(BigDecimal subtrahend, MathContext mc)
public BigDecimal multiply(BigDecimal multiplicand, MathContext mc)
public BigDecimal divide(BigDecimal divisor, int scale, RoundingMode roundingMode)
public BigDecimal divide(BigDecimal divisor, RoundingMode roundingMode)
public BigDecimal divide(BigDecimal divisor)
public BigDecimal divide(BigDecimal divisor, MathContext mc)
public BigDecimal divideToIntegralValue(BigDecimal divisor)
public BigDecimal divideToIntegralValue(BigDecimal divisor, MathContext mc)
public BigDecimal pow(int n)
public BigDecimal pow(int n, MathContext mc)
public BigDecimal abs(MathContext mc)
public BigDecimal negate(MathContext mc)
public BigDecimal plus()
public BigDecimal plus(MathContext mc)
public int precision()
public BigDecimal round(MathContext mc)
public BigDecimal setScale(int newScale, RoundingMode roundingMode)
public BigDecimal scaleByPowerOfTen(int n)
public BigDecimal stripTrailingZeros()
public String toEngineeringString()
public String toPlainString()
public BigInteger toBigIntegerExact()
public long longValueExact()
public int intValueExact()
public short shortValueExact()
public byte byteValueExact()
public BigDecimal ulp()

Binary Compatibility

Adding new public methods and constructors, even ones that overload existing names is binary compatible. Adding public static final fields is binary compatible, meaning existing clients of the library will continue to link. However, there is a possible complication here since BigDecimal is not final and since it has public constructors, it can be subclassed. (As discussed in Effective Java, Item 13, Favor Immutability, this was a design oversight when the class was written.) Adding fields to classes can be binary incompatible, but the needed combination of circumstances does not arise in this case. Therefore, individually and as a whole, the BigDecimal API additions are binary compatible.

Source Compatibility

For source compatibility, we can distinguish between clients of a types and extenders/implementors of a type; certain changes can inconvenience extenders/implementors but not clients.

Adding the public static final fields is binary-preserving source compatible. If a subclass, say MyDecimal, already has a field with the same name as a field being added to BigDecimal, the existing declaration in MyDecimal hides the new declaration in the parent class BigDecimal. Therefore, existing uses of, say, MyDecimal.TEN, would continue to resolve to the same binary name.

Since constructors are not inherited and all the new constructors are public rather than protected, just the uses of constructors in clients needs to be considered; there are no distinct special issues for subclasses. The constructors in BigDecimal during Java SE 1.4.x, the platform version immediately predating JSR 13, are listed in table 2.

Table 2 — 1.4.x era BigDecimal Constructors
Existing Constructors BigDecimal(BigInteger val)
BigDecimal(BigInteger unscaledVal, int scale)
BigDecimal(double val)
BigDecimal(String val)

To assess the source compatibility impact, we can compare the new constructors with the old constructors and see if any possible overload resolutions would change, including the possibility of stopping an existing compilation by removing the existence of a most specific method. Of the twelve new constructors, ten are clearly not problematic and binary-preserving source compatible; the ten either have more parameters than the existing constructors or are not applicable to the same invocations, see table 3. For example, eight of the new constructors have the new type MathContext as a parameter. Because of primitive subtyping the other two new constructors, BigDecimal(int val) and BigDecimal(long val) are both applicable to and more specific than invocations that would previously resolve to BigDecimal(double val). Therefore, adding these two new constructors is not binary-preserving source compatible because a different constructor can be resolved for the same existing source code, code with one-argument calls to a BigDecimal constructor where the argument is a primitive type. These two constructors need a secondary screening to assess their behavioral equivalence.

Table 3 — Source Compatibility Analysis of New Constructors
New Constructor Source Compatibility Impact
public BigDecimal(char[] in, int offset, int len) Binary preserving; more parameters than existing constructors.
public BigDecimal(char[] in, int offset, int len, MathContext mc) Binary preserving; more parameters than existing constructors.
public BigDecimal(char[] in) Binary preserving; disjoint with existing one-parameter constructors.
public BigDecimal(char[] in, MathContext mc) Binary preserving; disjoint with existing two-parameter constructors.
public BigDecimal(String val, MathContext mc) Binary preserving; disjoint with existing two-parameter constructors.
public BigDecimal(double val, MathContext mc) Binary preserving; disjoint with existing two-parameter constructors.
public BigDecimal(BigInteger val, MathContext mc) Binary preserving; disjoint with existing two-parameter constructors.
public BigDecimal(BigInteger unscaledVal, int scale, MathContext mc) Binary preserving; disjoint with existing two-parameter constructors.
public BigDecimal(int val) Warning: not binary preserving since more specific than existing one-parameter constructor, behavioral equivalence must be assessed.
public BigDecimal(int val, MathContext mc) Binary preserving; disjoint with existing two-parameter constructors.
public BigDecimal(long val) Warning: not binary preserving since more specific than existing one-parameter constructor, behavioral equivalence must be assessed.
public BigDecimal(long val, MathContext mc)
None; disjoint with existing two-parameter constructors.

Before JDK 5, the expressions BigDecimal(123) and BigDecimal(123L) in source code would resolve to a call to BigDecimal(double); as part of that resolution primitive widening conversion converts the argument expression to double before the constructor is invoked. All int values are exactly representable as double and the double constructor when given an integral value will return a BigDecimal with the numerical value in question and a scale of zero. The new int constructor will also return a BigDecimal with the numerical value of the argument and a scale of zero. Therefore, adding the int constructor will result in behavioral equivalent programs; although the new constructor will cause some invocations to resolve to a different constructor, calling the other constructor will still always result in an equivalent, bd1.equals(bd2)==true, BigDecimal. However, the new long constructor does not have behavioral equivalence for all values. Some long values are not exactly representable in double and the old longdouble conversion can silently lose precision. For example, printing the value of (new BigDecimal(Long.MAX_VALUE)) gives
9223372036854775808
under JDK 1.4.2 but
9223372036854775807
under JDK 5. More dramatically, printing (new BigDecimal(0x4000000000000200L)) gives
4611686018427387904
under JDK 1.4.2 but
4611686018427388416
under JDK 5. While the new behavior is "better" in the sense of exactly capturing the long argument value, it is a subtle change to existing source code. Strictly speaking, among the spectrum of different source compatibility levels, adding this constructor only preserves the weakest property, maintaining the ability to compile. Since the resolution of constructors in existing code is changed, adding this constructor is not binary-preserving source compatible, nor is it behaviorally equivalent since a different BigDecimal will be returned for some inputs. Since the class already had a static factory method with a long parameter that would convert values exactly, the long constructor did not need to be added to exactly get a BigDecimal with a long's value in a single operation.

Partially because of the unintentional, if beneficial, change in source meaning as well as some of the usual reasons (possibility to cache, etc.), in retrospect I think it would have been preferable for the functionality of all twelve new constructors to be provided through static factories instead. (While not directly applicable in BigDecimal, in general even if constructors aren't considered harmful, static factories can have better generics support.

A similar analysis can be undertaken for all the new methods. Additionally, since subclasses are possible, inheritance conflicts need to be considered too. Note that the new methods taking MathContext and RoundingMode parameters cannot conflict with existing methods in subclasses so all those additions are binary-preserving source compatible. However, if all the parameters of a new method are existing types, a subclass could potentially have a conflicting method with an unrelated return type. For example, MyDecimal could have a (strange) public double divide(BigDecimal divisor) method which would conflict with the addition of public BigDecimal divide(BigDecimal divisor). While BigDecimal generally shouldn't be subclassed, the addition of some of these new methods could prevent existing subclasses from compiling, yet another reason to favor composition over inheritance.

Behavioral Compatibility

In terms of evolving the behavior of existing methods after introducing the expanded exponent range, the main issues were the behavior of arithmetic operations and text ↔ BigDecimal conversion operations; the latter would prove to be unexpectedly troublesome.

As summarized in table 4, the behavior of arithmetic operations was quite compatible with a number of strong invariants. Given input values a1 and b1 representable under the old system, and given an existing method, say add, and its result c1, in the old and new BigDecimal if the inputs to an operation are .equals, same numerical value and same representation, then the output is exactly equivalent too, same numerical value with the same representation. More generally, in the old and new BigDecimal if the inputs to an operation satisfy the weaker property of being compareTo() == 0, meaning they have same numerical value but with a possibly different representation, then the output will be numerically equal, but possibly with a different representation.

Table 4 — BigDecimal Arithmetic Properties
Old:
c1 = a1.add(b1);
New:
c2 = a2.add(b2);
If
a1.equals(a2) AND
b1.equals(b2), then
c1.equals(c2)
If
a1.compareTo(a2) == 0 AND
b1.compareTo(b2) == 0, then
c1.compareTo(c2) == 0

A main advantage of decimal arithmetic over binary arithmetic is what-you-see-is-what-you-get for input and output values, the complicated vagaries of binary ↔ decimal conversion can be avoided and exact computation can be straightforward. Therefore, when removing the restriction on exponent values, being able to have a textual representation that readily mapped to all possible unscaled value and exponent pairs was paramount to make the new arithmetic usable. Before JSR 13, the toString method did not use exponential notation, all leading and trailing zeros were explicit. For fractional values, the length of the output grew linearly with the size of the exponent, as well as the number of digits of precision. Conversely, without negative exponents, the internal representation and string output of integer-valued BigDecimal numbers grew with the magnitude of the number, even when it was inherently low-precision. To take advantage of the new unrestricted exponent range, a textual notation was needed that allowed the positive or negative exponent to be recovered; this was accomplished by changing to using scientific notation in the toString output. When converting from text to BigDecimal, a positive exponent could be reconstructed from integer values that previously would have been forced to have a zero exponent. However, the new output was legal input to the old constructors, so similar properties similar to the old and new arithmetic behavior applied:

  • Within a given release, BigDecimal(bd.toString()).equals(bd) == true, meaning converting to and from a string preserves numerical value and representation.

  • toString output from the old BigDecimal converted by the new BigDecimal yields a result equivalent to the old value.

  • New toString output converted by the old BigDecimal yields:

    • An equivalent result when the exponent is negative.

    • A numerical equal result when the exponent is positive (representation may differ).

If needed, in the new BigDecimal on textual input the old semantics on exponents is easy to code:

BigDecimal bd = new BigDecimal(myString);
if (bd.scale() < 0)
  bd = bd.setScale(0);

and a toPlainString method was added to provide the old-style output when needed.

Staying within the realm of old and new BigDecimal versions, these arrangements solidly preserve a very reasonable kind of behavioral compatibility, numerical value and representation are kept constant when possible, otherwise, numerical value is preserved possibly with a different representation. Backwards serial compatibility is slightly weaker; rather than being converted to exponent-zero values as done for textual inputs, new serial streams holding positive exponents are rejected by old BigDecimal implementations. Unfortunately, despite these consistencies across JDK versions, some users of BigDecimal still ran into compatibility issues from the textual output changes made by JSR 13.

A common use for BigDecimal is interfacing to databases and while the new scientific notation was legal input to the old BigDecimal string constructors, scientific notation was not legal notation to databases. The addition of the toPlainString method did not help the situation without recompiling the source of the application in question; such recompilation could be unwanted since it would tie the application to JDK 5 with the new method. Other unpalatable workarounds include subclassing BigDecimal to enforce the old toString behavior or using reflection to see if the toPlainString method is available to call to avoid introducing a hard dependency on the new method.

While the changes in textual input and output of BigDecimal were reasonable in the context of direct Java compatibility, the expert group underestimated the behavioral compatibility impact of these change when dealing with databases. While the changes remain justifiable in terms of supporting the new values, if the compatibility cost were known, the expert group could have and should have worked with database vendors to mitigate the migration cost associated with this change.

Conclusion

Fully understanding the compatibility impact of changes is subtle and shortcomings are quick to lead to user anger. Merely maintaining binary compatibility is not sufficient for many purposes. Following good coding guidelines from the beginning can pay silent rewards when later evolving the class by reducing the space of possible concerns.

Acknowledgments

Alex provided helpful comments on a draft of this entry.


Further Reading

  1. Joseph D. Darcy and Mike Cowlishaw, JavaOne 2004 BOF 1638, Big News for BigDecimal,

  2. Be careful when you are using Oracle and Java 5

  3. BigDecimal and JDBC since Java 5.0

  4. Java 5 BigDecimal.toString() and Oracle 10g jdbc

(2008-04-21 19:36:57.0) Permalink Comments [3]

20080417 Thursday April 17, 2008

Kinds of Compatibility: Source, Binary, and Behavioral

Every change is an incompatible change. A risk/benefit analysis is always required.
—Martin Buchholz
Veteran JDK Engineer

When evolving the JDK, compatibility concerns are taken very seriously. However, different standards are applied to evolving various aspects of the platform. From a certain point of view, it is true that any observable difference could potentially cause some unknown application to break. Indeed, just changing the reported version number is incompatible in this sense because, for example, a JNLP file can refuse to run an application on later versions of the platform. Therefore, since not making any changes at all is clearly not viable for evolving the platform, changes need to be evaluated against and managed according to a variety of compatibility contracts.

For Java programs, there are three main categories of compatibility:

  1. Source: Source compatibility concerns translating Java source code into class files.

  2. Binary: Binary compatibility is defined in The Java Language Specification as preserving the ability to link without error.

  3. Behavioral: Behavioral compatibility includes the semantics of the code that is executed at runtime.

Note that non-source compatibility is sometimes colloquially referred to as "binary compatibility." Such usage is incorrect since the JLS spends an entire chapter precisely defining the term binary compatibility; often behavioral compatibility is the intended notion instead.

There are many other observable aspects of the JDK not related to Java programs, such as file layout, etc. Those will not be further discussed in this note.

The basic challenge of compatibility is the difficulty of finding and modifying all the software and systems impacted by a change. In a closed-world scenario where all the clients of an API are known and can in principle be simultaneously changed, introducing "incompatible" changes is just a matter of being able to coordinate the engineering necessary to evaporate the liquid in a small body of water, perhaps only a puddle or pot on a stove. In contrast, for APIs that are used as widely as the JDK, rigorously finding all the possible programs impacted by an incompatible change is as impractical as boiling the oceans, so evolving such APIs is quite constrained by comparison.

Generally, we will consider whether a program P is compatible is some fashion (or not) with respect to two versions of a library L1 and L2 that differ in some way. (We will not consider the compatibility impact of such changes to independent implementers of L.) Sometimes only a particular program is of interest; is the change from L1 to L2 compatible with this program? When evaluating how the platform should evolve, a broader consideration of the programs of concern is used. For example, does the change from L1 to L2 cause a problem for any program that currently exists? If so, what fraction of existing programs is affected? Finally, the broadest consideration is does the change affect any program that could exist? Often once a platform version is released, the latter two notions are similar because imperfect knowledge about the set of actual programs means it can be more tractable to consider the worst possible outcome for any potential program rather than estimate the impact over actual programs. Stated more formally, depending on the change being considered, judging the change based on the worst possible outcome for any program is more appropriate than judging based on some other kind of norm of the disruption over the space of known programs.

Generally each kind of compatibility has both positive and negative aspects; that is, the positive aspect keeping things that "work" working and the negative aspect of keeping things that "don't work" not working. For example, the TCK tests for Java compilers include both positive tests of programs that must be accepted and negative tests of programs that must be rejected. In many circumstances, preserving or expanding the positive behavior is more acceptable and important than maintaining the negative behavior and we will focus on positive compatibility in this entry.

In terms of relative severity, source compatibility problems are usually the mildest since there are often straightforward workarounds, such as adjusting import statements or switching to fully qualified names. Gradations of source compatibility are identified and discussed below. Behavioral compatibility problems can have a range of impacts while true binary compatibility issues are problematic since linking is prevented.


Source Compatibility

The basic job of any linker or loader is simple: It binds more abstract names to more concrete names, which permits programmers to write code using the more abstract names. (Linkers and Loaders)

A Java compiler's job also includes mapping more abstract names to more concrete ones, specifically mapping simple and qualified names appearing in source code into binary names in class files. Source compatibility concerns this mapping of source code into class files, not only whether or not such a mapping is possible, but also whether or not the resulting class files are suitable. Source compatibility is influenced by changing the set of types available during compilation, such as adding a new class, as well as changes within existing types themselves, such as adding an overloaded method. There is a large set of possible changes to classes and interfaces examined for their binary compatibility impact. All these changes could also be classified according to their source compatibility repercussions, but only a few of kinds of changes will be analyzed below.

The most rudimentary kind of positive source compatibility is whether code that compiles against L1 will continue to compile against L2; however, that is not the entirety of the space of concerns since the class file resulting from compilation might not be equivalent. Java source code often uses simple names for types; using information about imports, the compiler will interpret these simple names and transform them into binary names for use in the resulting class file(s). In a class file, the binary name of an entity (along with its signature in the case of methods and constructors) serves as the unique, universal identifier to allow the entity to be referenced. So different degrees of source compatibility can be identified:

  • Does the code still compile (or not compile)?

  • If the code still compiles, do all the names resolve to the same binary names in the class file?

  • If the code still compiles and the names do not all resolve to the same binary names, does a behaviorally equivalent class file result?

Whether or not a program is valid can also be affected by language changes. Usually previously invalid program are made valid, as when generics were added, but sometimes existing programs are rendered invalid, as when keywords were added (strictfp, assert, and enum). The version number of the resulting class file is also an external compatibility issue of sorts since that affects which platform versions the code can be run on.

Full source compatibility with any existing program is usually not achievable because of * imports. For example, consider L1 with packages foo and bar where foo includes the class Quux. Then L2 adds class bar.Quux. This program

import foo.*;
import bar.*;

public class HelloQuux {
    public static void main(String... args) {
	Object o = Quux.class;
	System.out.println("Hello " + o.toString());
    }
}

will compile under L1 but not under L2 since the name "Quux" is now ambiguous as reported by javac:

HelloQuux.java:6: reference to Quux is ambiguous, both class bar.Quux in bar and
 class foo.Quux in foo match
        Object o = Quux.class;
                   ^
1 error

An adversarial program could almost always include * imports that conflict with a given library.1 Therefore, judging source compatibility by requiring all possible programs to compile is an overly restrictive criterion. However, when naming their types, API designers should not reuse "String", "Object", and other names of core classes from packages like java.lang and java.util to avoid this kind of annoying name conflict.

Due to the * import wrinkle, a more reasonable definition of source compatibility considers programs transformed to only use fully qualified names. Let FQN(P, L) be program P where each name is replaced by its fully qualified form in the context of libraries L. Call such a library transformation from L1 to L2 binary-preserving source compatible with source program P if FQN(P, L1) equals FQN(P, L2). This is a strict form of source compatibility that will usually result in class files for P using the same binary names when compiled against both versions of the library. Class files with the same binary names will result when each type has a distinct fully qualified name. Multiple types can have the same fully qualified name but differing binary names; those cases do not arise when the standard naming conventions are being followed.2

Adding overloaded methods has the potential to change method resolution and thus change the signatures of the method call sites in the resulting class file. Whether or not such a change is problematic with respect to source compatibility depends on what semantics are required and how the different overloaded methods operate on the same inputs, which interacts with behavioral equivalence notions. Assume class C originally has a method void m(T t) and then an overload void m(S s) is added. Some cases of interest include:

  • S and T are both reference types:

    • If there is no typing relationship between S and T, overload resolution will not be affected.

    • If there is a typing relationship between S and T, such as T is a subtype of S, call sites in existing source may now resolve to the new method. Well-written programs will follow the Liskov substitution principle and C will do "the same" operation on the argument no matter which overloaded method is called. Less than well-written programs may fail to follow this principle.

  • S and T are both primitive types: By extension, if a numerical value can be represented in multiple primitive types, overloaded methods taking a type with that value should usually perform an equivalent operation. However, the silent loss of precision in primitive widening conversion can affect the actual value that gets passed to an overloaded method.

    Concretely, consider class C with methods m(int) and m(double). The call site "m(123L)" will undergo primitive widening conversion, converting the argument value to double before m(double) is called. Now if m(long) is added to C, the call site will resolve to the new method. Even assuming each m method does an equivalent operation when passed a numerically equal value, there can still be differences after the third method is added since some long values lose precision when converted to double, for example, Long.MAX_VALUE. Therefore, a client when compiled against the two version of C can have different runtime behavior even if each m method behaves reasonably. This kind of subtle change in overloading behavior occurred with the addition of a BigDecimal constructor taking as long as part of JSR 13

  • One of S and T is a reference type, the other is primitive: Before generics were added to the language, two methods which differed in the primitive/reference status of the ith parameter could not possibly be applicable to the same arguments. But, along with generics came boxing and unboxing conversions that can map, for example, a value of an int primitive type to a java.lang.Integer object with a reference type, and vice versa. These mapping have the potential to introduce ambiguities in method resolution such that adding a method could introduce an ambiguity that prevented previously valid code from compiling; however, the rules for method invocation expressions were updated to avoid such potential ambiguities from boxing/unboxing as well as var-args.

If a new method cannot change resolution, then it is a binary-preserving source transformation. If a new method can change resolution, if the different class file that results has acceptably similar behavior, the change may still be acceptable, while changing resolution in such a way that does not preserve semantics is likely problematic. Changing a library in such a way that current clients no longer compile is seldom appropriate.

Source compatibility levels of FQN programs

Binary Compatibility

JLSv3 §13.2 What Binary Compatibility Is and Is Not
A change to a type is binary compatible with (equivalently, does not break binary compatibility with) preexisting binaries if preexisting binaries that previously linked without error will continue to link without error.

The JLS defines binary compatibility strictly according to linkage; it P links with L1 and continues to link with L2, the change made in L2 is binary compatible. The runtime behavior after linking is not included in binary compatibility:

JLSv3 13.4.22 Method and Constructor Body
Changes to the body of a method or constructor do not break [binary] compatibility with pre-existing binaries.

As an extreme example, if the body of a method is changed to throw an error instead of compute a useful result, while the change is certainly a compatibility issue, it is not a binary compatibility issue since client classes would continue to link. Also, it is not a binary compatibility issue to add methods to an interface. Class files compiled against the old version of the interface will still link against the new interface despite the class not having an implementation of the new method. If the new method is called at runtime, an AbstractMethodError is thrown; if the new method is not called, the existing methods can be used without incident. (Adding a method to an interface is a source incompatibility that can break compilation though.)

A design requirement from the addition of generics via JSR 14 was migration compatibility. Migration compatibility requires that a library can be generified and existing (nongeneric) clients can continue to compile and link against the generic version. Meeting this constraint led to the use of erasure, a controversial aspect of the generics design. During JSR 14, it was not known how to add generics in a way that supported both reification and migration compatibility; future work might address this shortcoming.


Behavioral Compatibility

Intuitively, behavioral compatibility should mean that with the same inputs program P does "the same" or an "equivalent" operation under different versions of libraries or the platform. Defining equivalence can be a bit involved; for example, even just defining a proper equals method in a class can be nontrivial. In this case, to formalize this concept would require an operational semantics for the JVM for the aspects of the system a program was interested in. For example, there is a fundamental difference in visible changes between programs that introspect on the system and those that do not. Examples of introspection include calling core reflection, relying on stack trace output, using timing measurements to influence code execution, and so on. For programs that do not use, say, core reflection, changes to the structure of libraries, such as adding new public methods, is entirely transparent. In contrast, a (poorly behaved) program could use reflection to look up the set of public methods on a library class and throw an exception if any unexpected methods were present. A tricky program could even make decisions based on information like a timing side channel. For example, two threads could repeatedly run different operations and make some indication of progress, for example, incrementing an atomic counter, and the relative rates of progress could be compared. If the ratio is over a certain threshold, some unrelated action could be taken, or not. This allows a program to create a dependence on the optimization capabilities of a particular JVM, which is generally outside a reasonable behavioral compatibility contract.

The evolution of a library is constrained by the library's contract included in its specification; for final classes this contract doesn't usually include a prohibition of adding new public methods! While an end-user may not care why a program does not work with a newer version of a library, what contracts are being followed or broken should determine which party has the onus for fixing the problem. That said, there are times in evolving the JDK when differences are found between the specified behavior and the actual behavior (for example 4707389, 6365176). The two basic approaches to fixing these bugs are to change the implementation to match the specified behavior or to change the specification (in a platform release) to match the implementation's (perhaps long-standing) behavior; often the latter option is chosen since it has a lower de facto impact on behavioral compatibility.


Case Study

Consider two versions of a simple enum representing the crew of the USS Enterprise, one for the first season:

public enum StarTrekCast {
    JAMES_T_KIRK("Jim"),
    LEONARD_MCCOY("Bones"),
    JANICE_RAND("Yeoman Rand"),
    MONTGOMERY_SCOTT("Scotty"),
    SPOCK("Spock"),
    HIKARU_SULU("Sulu"),
    UHURA("Uhura"); // Any first name for Uhura is non-canon.

    private String nickname;
    StarTrekCast(String nickname) {
	this.nickname=nickname;
    }

    public String nickname() { return nickname;}
}

and another for the second season:

public enum StarTrekCast {
    JAMES_T_KIRK("Jim"),
    SPOCK("Spock"),
    MONTGOMERY_SCOTT("Scotty"),
    LEONARD_MCCOY("Bones"),
    /* JANICE_RAND("Yeoman Rand"), */ // Only in 8 episodes!
    HIKARU_SULU("Sulu"),
    PAVEL_CHEKOV("Chekov"), // Introduced in season 2.
    UHURA("Uhura"); // Any first name for Uhura is non-canon.

    private String nickname;
    StarTrekCast(String nickname) {
	this.nickname=nickname;
    }

    public String nickname() { return nickname;}
}

Compared to the first reason, the second season:

  1. Deletes yeoman Janice Rand

  2. Adds Pavel Chekov

  3. Reorders Bones, Scotty, and Spock to better reflect the order of who commands the ship if the Captain and others are unavailable.

These changes have varying source, binary, and behavioral compatibility effects:

  1. Deleting JANICE_RAND is source incompatible, able to break compilations. The deletion is also binary incompatible. Besides being observable via reflection, the deletion affects the behavior of various built-in methods on the enum, including values and valueOf. In addition, the deletion will break previously serialized streams with this constant.

  2. Adding CHEKOV is binary-preserving source compatible. Likewise, the addition of a new public static final field is binary compatible. However, the addition of a new constant is visible to reflection and alters the behavior of built-in enum methods. Existing serialized instance continue to work after a new constant is added.

  3. Reordering McCoy, Scotty, and Spock is a binary-preserving source compatible and binary compatible change, but the reordering changes the behavior of built-in methods, most notably compareTo.


JDK Platform and Update Release Compatibility Policies

The compatibility policies we apply to platform releases, like JDK 7, differ from those applied to maintenance and update releases, like JDK 6 updates. For both kinds of releases, binary compatibility must be maintained for JCP-managed APIs. Update releases must maintain source compatibility, but platform releases are able to break source compatibility given sufficient justification. In update releases, behavioral compatibility is regarded as very important; programs may be relying on specified-to-be-unspecified behavior of a particular implementation and switching to another update in the same release family should be seamless whenever possible. In contrast, platform releases have fewer restrictions on changing such behavior. So, for example, modifying the order of iteration of elements in a HashMap to allow faster hashing algorithms, would be quite appropriate for a platform release ("This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time."), but would be much less suited to an update release.


Managing Compatibility

Original Preface to JLS
Except for timing dependencies or other non-determinisms and given sufficient time and sufficient memory space, a program written in the Java programming language should compute the same result on all machines and in all implementations.

The above statement from the original JLS could be regarded as vacuously true about any platform: except for the non-determinisms, a program is deterministic. The difference was that in Java, with programmer discipline, the set of deterministic programs was nontrivial and the set of predictable programs was quite large. In other words, the platform provider and the programmer both have responsibilities in making programs portable in practice; the platform should abide by the specification and conversely programs should tolerate any valid implementation of the specification.

To make continued evolution of the platform more tractable, it may be helpful to introduce more structured ways of tracking behavioral changes so that programs could in principle by audited for depending on aspects of the platform in ways that are not recommended. For example, potentially annotations could be used to:

  1. Mark classes and methods whose specification has changed in a release (analogous to change bars in a written document).

  2. Record stability information about a method's contract, deterministic, non-deterministic, volatile (expected to change over time), etc., for example whether the hashCode of a class is specified to return particular values or just obey the general contract.

  3. Using com.sun.* annotations, annotate constructs whose implementations we have changed in our specific implementation in a particular release, such as HashMap ordering.

Annotation processing is a general purpose meta-programming framework, standardized as part of the platform as of JDK 6. Annotation processors, probably also using the tree API, could be written to check for usage of changed or problematic APIs in source code. The D compiler in DTrace can enforce analogous limits on the stability levels and dependency classes of D scripts.

While there would be considerable cost and complication to designing such a scheme and retrofitting it onto at least a subset of the JDK, the ability to define and then programmatically test policies for behavioral compatibility issues could enable platform providers and programmers to have a smoother joint stewardship of keeping applications running and Java usage growing.


Conclusion

Compatibility is a multifaceted concept, with nuances within each broad category. In the future, annotation processors or other program analyzers might help manage source, binary, and behavioral analysis by direct analysis or program markup.


Acknowledgments

Éamonn McManus gave useful feedback on a draft of this entry.


Notes

1 There are some cases where such an adversarial program could be thwarted in practice. For example, when the Unicode version supported by JDK platform is upgraded previously illegal identifier strings are often allowed. A new JDK platform class could use the newly valid names not open to preexisting malicious clients; although new adversaries could afterward use the new name. This assumes the compatibility threat model only includes class files generated from Java sources. As of class file version 49.0 for JDK 5 and later, at the JVM level many more identifiers are legal than those accepted in Java source.

2 Even code that always uses fully qualified names is not completely immune from ambiguities and unintended (or malicious) changes in the meaning of names stemming from changes in the library environment since distinct types can have the same fully qualified name. For example, the type name "a.b.C" could refer to:

  • class C in package a.b:

    package a.b;
    public class C {}
    

  • class C nested inside class b where class b is a member of package a:

    package a;
    public class b {
        public static class C{}
    }
    

  • class C nested inside class b which is in turn nested inside class a where class a is a member of an unnamed package (unnamed packages are not Immortal):

    public class a {
        public static class b {
    	public static class C{}
        }
    }
    

These three classes cannot all be compiled together ("package a.b clashes with class of same name"); however, they can be compiled separately to the same output location and so can all appear on a classpath when another file is compiled. If all three are on the classpath together, when other code is compiled the qualified name "a.b.C" resolves to the doubly-nested class C in an unnamed package.

To avoid such name collisions, binary names use "$" instead of "." to separate the name of an enclosing class from a nested class, leading to the distinct binary names "a.b.C", "a.b$C", and "a$b$C", respectively, for the classes in question. Following the recommended naming conventions avoids such name clashes. Therefore, such name clashes should be rare in practice when compiling against libraries following the conventions, as JCP moderated java.* and javax.* APIs should do. As an extreme case, do not write this program:

public class java {
    public static class lang {
	public static class String {
	   String(Object o){}
	}
    }

    public static void main(String... args) {
	java.lang.String s = 
	    new java.lang.String("I don't think this means " +
				 "what you think it means.");
	if (!s.getClass().getName().equals("java.lang.String"))
	    System.out.println("Inconceivable!");
    }
}

In this perverse example, the nested class java.lang obscures the venerable java.lang package and the local java.lang.String declaration shadows the standard java.lang.String.


Further Reading

  1. Evolving Java-based APIs, by Jim des Rivières.

  2. Different kinds of compatibility by Alex Buckley.

(2008-04-17 19:31:38.0) Permalink

Calendar

« November 2009
SunMonTueWedThuFriSat
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today

RSS Feeds

XML
All
/Annotation Processing
/General
/Java
/JavaOne
/Numerics
/OpenJDK

Search

Links

    Blogroll
  • Download the JRE

    News

Navigation



Referers

Today's Page Hits: 280