Joseph D. Darcy's Sun Weblog

Joseph D. Darcy's Sun Weblog


20091118 Wednesday November 18, 2009

Project Coin at Devoxx 2009

Wednesday evening local time I gave a talk about Project Coin at Devoxx 2009. The video of the talk will be available on Parleys in due course; in the mean time, my slides are online. Temping the wrath of the demo gods, I enjoyed showing the support Project Coin features have today in a developer build of NetBeans.

(2009-11-18 16:39:28.0) Permalink Comments [0]

Project Coin: Milestone 5 Language Features in NetBeans

To go along with the language changes available in JDK 7 milestone 5, the NetBeans team has created a developer build of NetBeans supporting the same set of language changes, including improved integer literals, the diamond operator, and strings in switch.

In addition to just accepting the new syntax, the NetBeans build has some deeper support too. For example, when auto-completing on a constructor with type arguments, the diamond operator is offered as a completion. To see what bounds were computed for a diamond instance, you can just hit Ctrl and click on the constructor; the bounds will appear in a pop-up. To encourage use of strings in switch, NetBeans will recognized the pattern of "if(s.equals("foo") {...} else if (s.equals("bar)..." and offer to transform that into strings in switch, just click on the "Convert to switch" lightbulb on the left hand side.

This NetBeans build with Coin support is based on NetBeans 6.8, but when 6.8 ships later this year it will not include support for Project Coin features. Project Coin support will be included in subsequent NetBeans releases.

(2009-11-18 16:08:07.0) Permalink Comments [0]

20091104 Wednesday November 04, 2009

Project Coin: Anatomy of adding strings in switch to javac

Earlier this week, I happily pushed an implementation of Project Coin's strings in switch language feature into a repository being used for JDK 7 milestone 5. JDK 7 binaries with strings in switch will be available for downloading in due course.

The javac compiler uses the standard compiler architecture of having successive phases and adding strings in switch required modifications to several links in the chain of phases.

Since JDK 1.4.x, javac has been multiple compilers in one since it has supported multiple source settings to specify what version of the language to use. (But when cross-compiling to an older language version, make sure to set the bootclasspath!) Information about which features are supported in a given version is encapsulated in javac's class com.sun.tools.javac.code.Source. When other parts of the compiler need to query if a feature is supported, they call a boolean method like Source.allowFeature(). Strings in switch added another method to this class; in diff notation:

+ public boolean allowStringsInSwitch() {
+ return compareTo(JDK1_7) >= 0;
+ }

The text of a source file being compiled is first turned into an abstract syntax tree (AST). The lexing and parsing processes which create the trees only verify the syntactic structure of the code. Semantic checks are done in subsequent phases of the compiler; many of those type checks are done during "attribution" by code in com.sun.tools.javac.comp.Attr. For switch statements, the constant-ness of case labels, having no repeated labels, and the labels' types matching the type of the expression being switched on are all handled in Attr. Attr also enforces the type restrictions on the switch expression; the expression can have an integer type, an enum type (as of JDK 5), and now a string type (as of JDK 7).

Because the trees in javac were sufficiently abstract, modifying Attr to support strings in switch was very straightforward. Most of the code was just for adding the allowStringsInSwitch check and generating an error message specific for a non-constant string. As shown in the patch below, no code implementing the actual semantic checks for non-repeated labels, etc. had to be adjusted to support strings in switch; all the existing checking logic was reused without modification.

--- a/src/share/classes/com/sun/tools/javac/comp/Attr.java	Thu Aug 27 13:40:48 2009 +0100
+++ b/src/share/classes/com/sun/tools/javac/comp/Attr.java	Mon Nov 02 21:36:59 2009 -0800
@@ -115,6 +115,8 @@ public class Attr extends JCTree.Visitor
         allowBoxing = source.allowBoxing();
         allowCovariantReturns = source.allowCovariantReturns();
         allowAnonOuterThis = source.allowAnonOuterThis();
+        allowStringsInSwitch = source.allowStringsInSwitch();
+        sourceName = source.name;
         relax = (options.get("-retrofit") != null ||
                  options.get("-relax") != null);
         useBeforeDeclarationWarning = options.get("useBeforeDeclarationWarning") != null;
@@ -166,6 +168,16 @@ public class Attr extends JCTree.Visitor
      * API warnings.
      */
     boolean enableSunApiLintControl;
+
+    /**
+     * Switch: allow strings in switch?
+     */
+    boolean allowStringsInSwitch;
+
+    /**
+     * Switch: name of source level; used for error reporting.
+     */
+    String sourceName;
 
     /** Check kind and type of given tree against protokind and prototype.
      *  If check succeeds, store type in tree and return it.
@@ -886,7 +898,15 @@ public class Attr extends JCTree.Visitor
         boolean enumSwitch =
             allowEnums &&
             (seltype.tsym.flags() & Flags.ENUM) != 0;
-        if (!enumSwitch)
+        boolean stringSwitch = false;
+        if (types.isSameType(seltype, syms.stringType)) {
+            if (allowStringsInSwitch) {
+                stringSwitch = true;
+            } else {
+                log.error(tree.selector.pos(), "string.switch.not.supported.in.source", sourceName);
+            }
+        }
+        if (!enumSwitch && !stringSwitch)
             seltype = chk.checkType(tree.selector.pos(), seltype, syms.intType);
 
         // Attribute all cases and
@@ -909,7 +929,8 @@ public class Attr extends JCTree.Visitor
                     Type pattype = attribExpr(c.pat, switchEnv, seltype);
                     if (pattype.tag != ERROR) {
                         if (pattype.constValue() == null) {
-                            log.error(c.pat.pos(), "const.expr.req");
+                            log.error(c.pat.pos(),
+                                      (stringSwitch ? "string.const.req" : "const.expr.req"));
                         } else if (labels.contains(pattype.constValue())) {
                             log.error(c.pos(), "duplicate.case.label");
                         } else {

Two new error messages were added for strings in switch. The first reports that a constant string expression is required rather than just a constant expression. The second reports that a string switch statement has been found in a source level where that feature is not allowed. The javac convention for this kind of error message is to report the source level being used in the current compile and the source level where the structure starts being supported:

+compiler.err.string.const.req=\
+    constant string expression required

+compiler.err.string.switch.not.supported.in.source=\
+    strings in switch are not supported in -source {0}\n\
+(use -source 7 or higher to enable strings in switch)

The bulk of the compiler changes to support strings in switch were made in com.sun.tools.javac.comp.Lower, the compiler phase which translates away syntactic sugar, lowering structures to trees implementing the formerly sugar-coated functionality. For example, Lower already had a method to translate enum switches into a switch on an integer value retrieved from an enum → int map. The initial strings in switch implementation uses a similar technique: a single string switch in source code is lowered into a series of two switches. The first switch is a new synthesized switch on the string's hash code, which gets mapped to a label's ordinal position on the list of case statements. The second switch is structurally identical to the original string switch from the source except that the string case labels are replaced by integer positions and the computed position from the synthesized switch is the expression being switched on.

Discussion of various design issues and implementation alternatives can be found in the full code changes in Lower; see the webrev of the changeset for details. The chained series of switch statements to implement strings is switch is a refinement of the implementation approaches outlined in the original Project Coin strings in switch proposal.

Various smaller implement points about the code in Lower are worth noting too.

A pair of maps are built to hold information about translating string case labels → positions and hash codes → case labels with that hash. In order to make the compiler's output predictable and repeatable, the maps and sets used in these data structures are LinkedHashMaps and LinkedHashSets rather than just HashMaps and HashSets. In terms of functional correctness of code generated during a given compile, using HashMap and HashSet would be fine; the iteration order does not matter. However, we find it beneficial to have javac's output not vary based on implementation details of system classes.

The code synthesis in Lower uses a compiler-internal tree maker API. The resulting trees are unprocessed; the various consistency properties and derived information calculated by earlier compiler phases are not directly enforced. The code in Lower is responsible for generating correct code and computing necessary derived information; if this is not done correctly, later phases like Gen, which translates trees into byte code, can fail. For example, trees have an associated source position; this is preserved in debugging information. The position for the initial synthesized switch is set to the source position of the start of the original switch statement. The tree node for a break statement stores information about the target of the break, which is another tree node. Therefore, break statements synthesized as part of the initial switch need this target information to be set and any break nodes in from original switch statement need to have their targets reset to the newly generated enclosing switch where integer labels have replaced the string ones. This target patching logic gets thoroughly exercised by the tests with nested strings in switch statements!

The new tests verify that the proper structural checks are enforced for string switches as well as verify that the proper execution paths are taken on different inputs for switches with a variety of control flow shapes, including multiple case labels, case labels with colliding hash codes, and nested switches.

Now that strings in switch are specified, implemented, and tested, I'm looking forward to working on the remaining Project Coin features on the final acceptance list.

(2009-11-04 14:21:36.0) Permalink Comments [3]

20090923 Wednesday September 23, 2009

Java Posse #279: A View from an Eggshell-Colored Clock Tower

Dick Wall and the rest of the Java Posse graciously offered to interview Alex Buckley and me one evening at the JVM Language Summit last week. Our discussion about general evolution of the Java programming language, including Project Coin and reactions to previous podcasts and Google group threads, is now available as episode #279 of the Java Posse podcast.

(2009-09-23 13:10:29.0) Permalink Comments [1]

20090914 Monday September 14, 2009

Java Posse #277 Feedback: Still not a view from an ivory tower

A follow-up entry to Dick Wall's Google Group post to my earlier reaction to Java language evolution and management concerns raised in the first twenty minutes of episode #277 of the Java Posse podcast.


Anyone can have an opinion. Having an informed opinion takes some effort. Implementing the conclusions of an informed opinion can take considerably more effort.

Project Coin != http://bugreport.sun.com/bugreport/

For many years people with ideas for language changes (and other matters) have been welcome to submit them to bugs.sun.com; there is no expectation that something other than a rough idea is required. These ideas are evaluated and there are well over 100 open "requests for enhancement" (rfes) related to language changes. I reviewed all of these open ideas before embarking on Project Coin. Many other submitted language rfes have been considered and subsequently closed.

Project Coin explicitly offered a different social contract than bugs.sun.com; beyond just a vague idea, contributors were invited to participate in the work of bringing a language change into the platform. To be clear, the abundance of open language rfes means an additional idea for language change in and of itself has essentially no value. Instead, the coin of the realm in Project Coin was the analysis of the impacts and benefits of a language change and code for a prototype implementation. Those are valuable because they are essential components of the work needed to bring a language change to the platform. The Project Coin Proposal form [1] guided the analysis and the OpenJDK langtools repository gave a starting point for a compiler prototype. People could collaborate on different aspects of a proposal and the Project Coin list explicitly made such requests for assistance on-topic. The recommendation for prototypes was not made for punitive purposes; rather it was made so that more accurate information could be gathered about the language change. Quoting a comment left by Mark Mahieu on my blog [2]:

"By the time I posted a link to a runnable prototype [of the Elvis operator] to the list, my own understanding was much, much clearer, and very different; after delving into the real detail I'd come to the conclusion that it's not a good enough fit (for Java) in the form proposed."

In contrast to this productive discourse, take the brouhaha over not including multi-catch in the Coin final five left in comments on my blog. [3] My message announcing the final five makes clear that this decision was made based on resourcing concerns rather than the merits of the idea itself. Not one of the people leaving comments full of wailing and gnashing of teeth about the omission offered to do anything to help implement the feature.

It is far easier to impose demands than to satisfy them. When there is no "cost connection" between those imposing demands and those satisfying them, ridiculous expectations can result, such as this individual [4] whose series of requests to jdk7-dev in June I will paraphrase:

"Hi. I'm some random Java developer with admittedly little technical expertise [5] and no money. I read one blog entry written by Neal Gafter several years ago [6]; despite my lack of money and admitted lack of technical expertise, I think reading that single blog entry written by someone else should imbue me with enough authority to dictate [7] how other people should allocate their resources to work on the cool Java language changes I personally want to see."

I have exactly zero respect for this line of thinking and see no reason to tolerate it.

If someone says he doesn't know what he is talking about, I believe him. I also take the next logical step of not giving much credence to his conclusions and demands.

No one is stopping this fellow or any other interested person from taking a compiler course, downloading the OpenJDK sources to javac, reading the considerable programming language literature of generics, and working on an implementation of reified generics or some other language variant. (The careful reader will note that Microsoft's papers describing reified generics in CLR emphasize how fast they were able to make List<int> go and do not focus on the performance penalty paid by List<Integer> because of the extra level of indirection between an object and its dispatch table.)

On the subject of listening to developers ideas for changes, I posted some related thoughts to the coin list back in July [8]:

"Design by committee" is often derided as an inappropriate way to manage technical projects. Simple polling about technical issues is design by committee where the committee can be arbitrarily large and any pretense of expertise in the area is removed. Therefore, polling would be a poor way to drive specific technical decisions about the Java Programming Language. One of the benefits of working in a technical field is that technical problems often have right answers, regardless of how many people agree or disagree with them.

This is not intended to be a slight against Java programmers who contributed suggestions informed by their daily experiences with the language to the Coin alias, to Sun's bug database, or elsewhere. Rather, it is a recognition that, just as being a player and being a coach are district roles requiring distinct skills, using the language and evolving it and district tasks with related but separate skills sets. Polling can provide a good indication about what pain points people are experiencing; that is an important input, but only one kind of input, to how the language should change.

Most Java programmers do not need to be language experts. This is very much a feature and not a bug of the Java ecosystem. Not having to be a language expert means Java programmers have time to be experts about other things :-)

Moreover, the responsibilities of stewardship including preserving the conceptual integrity of the platform, which does not necessarily follow from point decisions.

I don't understand who is being accused of preventing or impeding use of, say, Scala. For its part, Sun has encouraged experimentation with the Java language changes and has funded work to improve the support of non-Java language on top of the JVM too. Lack of corporate sponsorship of particular other languages should certainly not be equated to impeding their use. Conversely, Java developers are in no way obliged to participate in Project Coin or OpenJDK activities. However, if the extent of a developer's interaction with those working on the language is leaving childish comments on blogs, don't expect to have much influence over the results.

[1] Project Coin: Small Language Change Proposal Form Available
[2] http://blogs.sun.com/darcy/entry/project_coin_final_five#comment-1252023525000
[3] http://blogs.sun.com/darcy/entry/project_coin_final_five#comments
[4] http://mail.openjdk.java.net/pipermail/jdk7-dev/2009-June/000666.html
[5] http://mail.openjdk.java.net/pipermail/jdk7-dev/2009-June/000704.html
[6] http://mail.openjdk.java.net/pipermail/jdk7-dev/2009-June/000675.html
[7] http://mail.openjdk.java.net/pipermail/jdk7-dev/2009-June/000686.html
[8] http://mail.openjdk.java.net/pipermail/coin-dev/2009-July/002120.html

(2009-09-14 16:33:35.0) Permalink Comments [14]

20090910 Thursday September 10, 2009

Unhashing a String

My working on the strings in switch implementation has gotten swapped back in. The implementation is following the translation strategy outlined in the strings in switch proposal: find a perfect hash function for the input strings and semantically replace case "Foo": with case hash("Foo"): (along with some additional checks and logic) where hash("Foo") is computed as a constant by the compiler.

With this approach, to write the regression tests it is helpful to be able to construct a string with a given hash value to force collisions and test the alternate logic, which led me to write the "unhash" method below to create a string with a given hash code:

/**
  * Returns a string with a hash equal to the argument.
  * @return string with a hash equal to the argument.
  */
 public static String unhash(int target) {
    StringBuilder answer = new StringBuilder();
    if (target < 0) {
        // String with hash of Integer.MIN_VALUE, 0x80000000
        answer.append("\u0915\u0009\u001e\u000c\u0002");

        if (target == Integer.MIN_VALUE)
            return answer.toString();
        // Find target without sign bit set
        target = target & Integer.MAX_VALUE;
    }
        
    unhash0(answer, target);
    return answer.toString();
}

private static void unhash0(StringBuilder partial, int target) {
    int div = target / 31;
    int rem = target % 31;

    if (div <= Character.MAX_VALUE) {
        if (div != 0)
            partial.append((char)div);
        partial.append((char)rem);
    } else {
        unhash0(partial, div);
        partial.append((char)rem);
    }
}

The algorithm for hashing strings multiplies the old hash value by 31 and adds in the integer value of the next character:

h = 0;
for (int i = 0; i < len; i++) {
    h = 31*h + val[off++];
}

The unhash method works in reverse; for the non-negative values handled by unhash0, divide the target value by 31:

  • if the quotient is less than Character.MAX_VALUE, the quotient and remainder can be fully captured using at most two characters.

  • if the quotient is greater than or equal to Character.MAX_VALUE, unhash the quotient and append to that string a character for the remainder.

With some additional care, negative values can reuse the same process. The key observation is that if a string hashes to Integer.MIN_VALUE (0x80000000), subsequent multiples by 31 (0x1f) do not change the result since
0x1f × 0x80000000 = 0xf80000000
which is again 0x80000000 when limited to int range. Therefore, the sign bit can be set and then the remaining bits handled as if the target were positive.

Using a few minutes of computer time, I tested the unhash method on all integer values and it always returned a correct string. When available, exhaustive testing is pleasantly simple and reassuring! The generated strings are relatively short; as shown in the table below, for non-negative values the average length is slightly over four characters.

Distribution of String Lengths of Unhashed Non-negative Values
Length Frequency
1 31
2 2,031,585
3 60,948,480
4 1,889,402,880
5 195,100,672
Total 2,147,483,648

The unhash of 0 could be special-cased to return the empty string of length zero rather than a length-one string of the \u0000 character, but this was not necessary for the purposes at hand. Likewise, generating somewhat shorter strings for negative hash values may be possible, but further investigation is not needed just to be able to generate collisions. As is stands, unhash will return a string whose length is at most ten; five characters for a negative sign bit and at most another five characters for the non-sign bits.

(2009-09-10 11:50:30.0) Permalink Comments [7]

20090904 Friday September 04, 2009

Project Coin: Solidarity with C++ Evolution

Recently I read with interest Bjarne Stroustrup's HOPL III paper Evolving a language in and for the real world: C++ 1991-2006. Despite the numerous technical differences between Java and C++, I was struck by some of the similarities in community involvement and expectations in the evolution of both languages. Selected excerpts from the paper are in block quotes below.

In particular, this very open process [in the C++ committee] is vulnerable to disruption by individuals whose technical or personal level of maturity doesn’t encourage them to understand or respect the views of others. Part of the consideration of a proposal is a process of education of the committee members. Some members have claimed — only partly in jest — that they attend to get that education.

The Project Coin mailing list is a world-readable and world-writable list. While this approach does let anyone join in, the traffic can be very high and at times the signal to noise ratio was quite low. In the future, I'll be inclined to impose temporary moderation on the list to quell unproductive email storms.

The answer to “Why didn’t we provide a much more useful library?” is simpler: We didn’t have the resources (time and people) to do significantly more than we did.


The most common reaction to these extensions among developers is “that was about time; why did it take you so long?” and “I want much more right now”. That’s understandable (I too want much more right now — I just know that I can’t get it), but such statements reflect a lack of understanding what an ISO committee is and can do.


As ever, there are far more proposals than the committee could handle or the language could absorb. As ever, even accepting all the good proposals is infeasible. As ever, there seems to be as many people claiming that the committee is spoiling the language by gratuitous complicated features as there are people who complain that the committee is killing the language by refusing to accept essential features. If you take away consistent overstatement of arguments, both sides have a fair degree of reason behind them. The balancing act facing the committee is distinctly nontrivial.

Viewed over the long term, one goal to evolving a platform is trying maximize value delivered over time. This is analogous to a net present value-style consideration from economics. A feature delivered in the future is less valuable than having the feature today, but the value of choosing to do a feature needs to be weighed against the opportunity costs of doing something else instead. Developers are chronically optimistic and eager to deliver something sooner rather than later, especially when the next release vehicle may be in the relatively distant future. As previously indicated, I too would prefer to see additional language changes as part of Project Coin in JDK 7. However, given the available resources, overcommitting to a large set of features is not responsible; either the large set won't get done in the end, it won't get done well, or the schedule would slip — all of which lead to reduced value too.

Much of the best standards work is invisible to the average programmer and appears quite esoteric and often boring when presented. The reason is that a lot of effort is expended in finding ways of expressing clearly and completely “what everyone already knows, but just happens not to be spelled out in the manual” and in resolving obscure issues that—at least in theory—don’t affect most programmers. The maintenance is mostly such “boring and esoteric” issues.

Some attendees of my JavaOne talk this year were not happy with the length of time spent relating complications with adding enum types in JDK 5. However, I included such a large section on the apparent simplicity of enums still leading to many surprising complexities to help convey the disproportionate efforts that adding even modest features to the language can take.

I fully expect to be surprised in the future with novel interactions and issues as experience is gained with the Project Coin features and prudent planning anticipates the need to deal with such surprises.

(2009-09-04 17:39:48.0) Permalink Comments [8]

20090903 Thursday September 03, 2009

Java Posse #277 Feedback: Not a view from an ivory tower

The entry below is a slightly edited copy of a message I used to start a new thread on the Java Posse's Google Group, largely in response to comments make by Dick Wall in the first twenty minutes of episode #277 of the Java Posse podcast.


After listening to episode 277, I'm led to conclude I'm thought of by some as one of the "ivory tower guys" who "just says no" to ideas about changing the Java programming language.

I have a rather different perspective.

In November 2006, Sun published javac and related code under the familiar GPLv2 with Classpath exception. [1]

Shortly thereafter in January 2007, no less a Java luminary than James Gosling endorsed the Kitchen Sink Language (KSL) project. [2] In James' words KSL is "A place where people could throw language features, no matter how absurd, just so that folks could play around" since he has "... never been real happy with debates about language features, I'd much rather implement them and try them out." [3]

KSL received no significant community response.

Later in 2007, after the remaining core components of the platform were published as open source software as part of OpenJDK during JavaOne, in November Kijaro was created. [4] Kijaro is similar in spirit to KSL, but does note require contributors to sign the Sun Contributor Agreement (SCA). Before Project Coin, Kijaro saw a modest number of features developed, fewer than ten, which is also not a particular vigorous community response given the millions of Java developers in the world.

The earliest posts on what would become Project Coin mentioned the utility of prototypes, the Project Coin proposal form included a section to provide a link to an optional prototype, and I repeated stated throughout Project Coin the helpfulness of providing a prototype along with a proposal.

Despite the availability of the OpenJDK sources for javac and the repeated suggestions to produce prototypes, only a handful of prototypes were developed for the 70 proposals sent into Project Coin.

Dick asked rhetorically during the podcast whether alternative projects exploring language changes were inevitable as the "only approach given strict control exercised over the JVM [by Sun]."

IMO, such approaches are inevitable only if Sun's repeated efforts to collaborate continue to be ignored.

Classes on compilers are a core component of many undergraduate compiler science curricula. All the Java sources in the JDK 7 langtools repository adds up to about 160,000 lines of code and javac itself is a bit over 70,000 lines currently. These are far from trivial code bases and some portions of them are quite tricky, but implementing certain changes isn't that hard. Really. Try it out.

Dick says toward the end of the opening segment "If people do want to do this stuff, right now they are being told they can't."

I certainly do not have the authority to tell others what they can and cannot do. Indeed, I have advocated producing prototypes of language changes as a much more productive outlet than whining and pouting that other people aren't busy implementing the language changes you want to little avail. Others have already noted in previous messages to this group the irony of Lombok using the annotation processing facility I added to javac in JDK 6 as an alternate way to explore language changes (together with an agent API to rewrite javac internal classes!) . However, way back before JDK 5 shipped in 2004, we at Sun recognized that annotation processors by themselves would be a possible way to implement certain kinds of de facto language changes. The apt tool and later javac were always designed to be general meta- programming frameworks not directly tied to annotations; for example, an annotation processor can process a type containing no annotations to, say, enforce a chosen extra-linguistic check based on the structure of the program. [Such as the naming convention checker shipped as a sample annotation processor in JDK 6.]

As an example of what can be done just using annotation processing, long-time annotation processing hacker Bruce Chapman implemented "multi-line strings" as part of his rapt project [5]; the value of the string is populated from a multi-line comment. After repeatedly outlining how it would be possible to do so on the annotation processing forum [6], I've gotten around to hacking up a little proof- of-concept annotation processor based implementation of Properties. [7] The user writes code like

public class TestProperty extends TestPropertyParent {
   protected TestProperty() {};

   @ProofOfConceptProperty
   protected int property1;

   @ProofOfConceptProperty(readOnly = false)
   protected long property2;

   @ProofOfConceptProperty
   protected double property3;

   public static TestProperty newInstance(int property1,
                      long property2,
                      double property3) {
       return new TestPropertyChild(property1, property2, property3);
   }
}

and the annotation processor generates the superclass and subclass to implement the desired semantics, including the getter and setter methods, etc. Using annotation processors in this way is a bit clunky compared to native language support, but if people want to experiment, the capabilities have been standardized as part of the platform since JDK 6.

It is technically possible to take the OpenJDK sources and construct a version of javac that accepts language extensions; after all, this is how we generally evolve the language and also how the JSR 308 changes were developed before going back into the JDK 7 code base. Additionally, the IcedTea project and the shipping of OpenJDK 6 in Linux distributions has provided an existence proof that folks other than Sun can take the entire OpenJDK code base, add various patches and additional components to it, and ship it as a product.

Given the OpenJDK sources Sun has published, subject to the license and trademark terms and conditions, anyone is free to implement and use language changes, as long as they assume the costs and responsibilities for doing so. Experimentation has long been encouraged and experiences from experiments using language changes on real code bases would certainly better inform language evolution decisions. Unfortunately, others have generally not done these experiments, or if the experiments have been done, the results have not be shared.

I also do not have the power to prevent others from using non-Java languages on the JVM or to force others to run anything on the JVM, nor would I want to exercise such powers even if I had them. Indeed, for some time Sun has endorsed having additional languages for the platform and the main beneficiary of John Rose's JSR 292 work will not be the Java language, but all the non-Java languages hosted on top of the JVM.

I do have the authority to speak on what Sun will and will not spend our resources on in relation to Project Coin, certainly a right any organization reserves to do with its resources.

If there are frustrations waiting for Java language changes, I assure you there are also frustrations working on Java language changes. For example, I find it frustrating (and self-inconsistent) that people state "I don't have technical expertise in this area" while simultaneously expecting their preferences to be selected without any contribution on their part. [8]

Finally, going back to a white paper from 1996, the design of Java quite intentionally said "No!" to various widely-used features from the C/C++ world including a preprocessor and multiple inheritance. Again since the beginning, Java admittedly borrowed features from many other established languages. [9] Given the vast number of potential ways to change the language that have been proposed, many language changes will continue to be called and few will continue to be chosen. In any endeavor there is a tension to balance stability and progress. For the Java language, given the vast numbers of programmers and existing code bases, we try to err on the side of being conservative (saying "No." by default) first to do no harm, but also to preserve the value of existing Java sources, class files, and programmer skills.

There are many other fine languages which run on the JVM platform and I expect the Java language to continue to adopt changes, big and small, informed both by direct experiments with prototypes and by experiences with other languages.

[1] http://blogs.sun.com/ahe/entry/javac_open_sourced
[2] https://ksl.dev.java.net
[3] http://blogs.sun.com/jag/entry/compiler_fun
[4] https://kijaro.dev.java.net
[5] https://rapt.dev.java.net; see also Bruce's https://hickory.dev.java.net
[6] http://forums.sun.com/forum.jspa?forumID=514
[7] http://blogs.sun.com/darcy/entry/properties_via_annotation_processing
[8] http://blogs.sun.com/darcy/entry/project_coin_final_five#comments
[9] http://java.sun.com/docs/white/langenv

(2009-09-03 15:04:18.0) Permalink Comments [43]

20090828 Friday August 28, 2009

Project Coin: The Final Five (Or So)

First, thanks to all those who submitted interesting proposals and thoughtful comments to Project Coin; a demonstrably vibrant community wants to evolve the Java programming language!

Without further ado, the final Project Coin changes accepted for inclusion in JDK 7 are:

The specification, implementation, and testing of these changes are not final and will continue to evolve as interactions are explored and issues cleared. Two of the listed items are combinations of multiple submitted proposals. The omnibus proposal for improved integer literals includes at least binary literals and underscores in numbers; a way to better handle unsigned literals is desired too. Language support for Collections covers collection literals and allows for developing indexing access for Lists and Maps, assuming the technical problems are resolved.

That leaves a few proposals that went through further consideration, but were not accepted on the final list:

Improved exception handling would be a fine change for the language, but it ventures near the type system and we do not estimate we have resources to fully develop the feature within JDK 7. I would like to see improved exception handling reconsidered in subsequent efforts to evolve the language. While the Elvis operator and related operators are helpful in Groovy, differences between Groovy and Java, such as the presence of primitive types and interactions with boxing/unboxing render the operators less useful in Java. JDK 7 will support other ways to ease the pain of null-handling, such as the null checking enabled by JSR 308. Aggregates natively supporting more than 32-bits worth of entries will no doubt be increasingly needed over time. Language support for collections may allow such facilities to be developed as libraries until the platform directly handles large data structures.

The coin-dev list will remain available for the continued refinement of the accepted changes. Further discussion of proposals not selected is off-topic for the list.

The majority of the implementation of the Project Coin changes should be in JDK 7's development Mercurial repositories by the end of October 2009.

In due course, we intend to file a JSR covering the Project Coin changes.

(2009-08-28 17:45:48.0) Permalink Comments [65]

20090819 Wednesday August 19, 2009

Generics and the Mandelbrot set

Elaborating on some slides from a JavaOne talk, the Mandelbrot set is defined recursively as the set of values in the complex plane C where the iterations zn+1 = zn2 + c remain bounded, giving rise to a familiar and complex shape.

The Mandelbrot Set

Determining whether a particular point is inside or outside the boundary of the Mandelbrot set can be difficult because of the fractal nature of the curve; however, good approximations are possible. First at a coarse level, if the absolute value of a point is greater than 1, it is definitely not part of the Mandelbrot set so all of the Mandelbrot set is contained within a circle of radius 2 centered at the origin. Second, there are two primary curves within the set:

  • A circle of radius ¼ centered at (-1, 0)

  • A heart-shaped cardiord whose boundary is c = eit/2 - (eit/2)2

The overall area of the Mandelbrot set is a bit over 1.5; the circle has area ≈0.1963, 13.0% of the total, and the cardiord has area ≈1.178, 78.1% of the total. Therefore, together the circle and cardiord contain over 90% of the area of the whole set and it is comparatively easy to determine if a point is inside or outside the union of the circle and the cardiord.

The Mandelbrot Set Approximated

Using generics in Java has some similarities to the Mandelbrot set. Generics can be recursive, such as in the f-bound in the declaration of java.lang.Enum: public abstract class Enum<E extends Enum<E>>..., and it can be trickly to determine if a use of generics is reasonable. Fortunately, another similarity is that there are two primary use-cases for generics that cover the vast majority of sensible scenarios:

  • Generic aggregates, List<T>, Set<T>, etc.

  • Type tokens, Class<T> (or even super type tokens)

Aggregates like subtypes of java.util.Collection are the heart of generics usage and using generic collections is usually straightforward; Effective Java's PECS mnemonic (producer-extends, consumer super) provides guidance for some of the trickier cases. The second most common use of generics is for type tokens, Class<T>, which embody type information both at compile-time and at runtime. For example, type tokens are used to retrieve annotations.

Be wary of other uses of generics in Java. Java's generics have significantly technical differences from templates in C++; Java generics are by design not a Turing-complete meta-language! Attempting to use Java generics to simulate features in another languages, like Haskell's pattern matching, is unlikely to lead to pleasant or idiomatic Java code. In a Java program, using pervasive type parameters to pass along other information throughout a program, such as to address code evolution issues, is also not a pleasant fit. A warning sign in API design is a Java class having more than two type parameters; this likely signals generics are being used in an awkward way.

(2009-08-19 09:30:00.0) Permalink

20090818 Tuesday August 18, 2009

Project Coin: Elephants, Knapsacks, and Language Features

Paraphrasing some thoughts already sent to the Project Coin list and discussed at JavaOne this year, there continues to be traffic on the list and elsewhere about the criteria for proposal selection (and non-selection) and those criteria are worth elaborating ahead of the final proposal list being determined in the near future.

First, a reminder from some earlier blog entries describing the context for Project Coin:

"Especially with the maturity of the Java platform, the onus is on the proposer to convince that a language change should go in; the onus is not to prove the change should stay out."
Criteria for desirable small language changes, December 23, 2008

"Given the rough timeline for JDK 7 and other on-going efforts to change the language, such as modules and annotations on types, only a limited number of small changes can be considered for JDK 7."
Guidance on measuring the size of a language change, December 11, 2008

With nearly 70 proposals submitted to the mailing list and the Sun bug database having well over 100 open "requests for enhancements" (rfe's) for the language, the large majority of those proposals and rfe's will not be included in JDK 7 as part of Project Coin or any other effort.

Project Coin will be limited to around 5 proposals total. That's it.

Therefore for Project Coin, in addition to determining whether a proposal to change the language is in and of itself appropriate, a determination also has to be made as to whether the change is more compelling than all but four or so other proposals. In economic terms, there an an opportunity cost in the proposal selection; that is, because of finite resources, choosing to have a particular proposal in the platform removes the opportunity to do other proposals. There will be good, compelling proposals that would improve the language not selected for Project Coin because there are a full set of better, more compelling proposals that are more useful to include instead. Having available prototypes for proposals, running the existing tests, and writing new tests can only better inform the continuing proposal evaluation and selection.

Part of evaluating proposals is judging their utility to the Java platform as a whole. In this way, I've long thought the Java platform is like the elephant in the parable about the blind men and the elephant:

Six Blind Dukes and an Elephant

While each Duke and each of us may know and understand our own usage of Java quite well and have valid ideas about how Java could be changed to improve programming in our own areas of interest (properties! operator overloading! design by contract! your favorite feature!), putting together all that accurate local information might just result in a patchwork elephant:

Patchwork Elephant

Rather than just a collection of local views, a broad perspective is needed to have an accurate, unified picture of the platform so Java can keep evolving and improving as a general-purpose programming language. This approach favors features with broader applicability. For example, a usability improvement to generics, like the diamond operator which allows type parameters to be elided during constructor calls, is usable more often than, say, one of the various proposals to allow extensible or abstract enum types, a change that would only helpful in a small minority of enum declarations.

Even with a broad perspective, there are complexities in feature selection because choosing a set of language proposals is a kind of knapsack problem. That is, each feature has some discrete size and complexity to implement and confers some improvement to the language. There is a bounded size and complexity budget and the goal is maximizing the value held in the knapsack, the value of the set of improvements shipped in a release. Of note is that implementing a language change has much more of a discrete size (or a small selection of possible sizes) rather than a continuous range of possible sizes. In other words, because of the coordinated set of deliverables associated with a language change, it may be reasonable to implement 0%, 50,% or 100% of a possible feature but no other fraction. And doing 50% of the feature might take on quarter of the effort of doing the whole thing or three fourths of that effort.

Even when precise costs and benefits can be quantified, because of these discrete sizes the "greedy" algorithm of putting the highest value / cost item in the knapsack first can lead to globally poor results. If nothing else, having a pre-pass to reduce the number of proposals being considered for further review greatly reduces the combinatorial possibilities of subsets of features that could be included in a release.

(2009-08-18 01:26:27.0) Permalink Comments [2]

20090817 Monday August 17, 2009

JDK Release Types and Compatibility Regions

There are three primary kinds of compatibility of concern when evolving the JDK, source, binary, and behavioral. These can be visualized as defining a three dimensional space:

Compatibility Axes

The farther away a point is from the origin, the more incompatible a change is; the origin itself represents perfect compatibility (no changes). A more nuanced diagram would separate positive and negative compatibility, that is distinguish between keeping things that work working versus keeping things that don't work not working, but in this article the diagrams will just represent the magnitude of allowable compatibility change.

In turn, there are three main kinds of JDK releases:

  • platform

  • maintenance

  • update

JDK 7 is a platform release since it is a new version of platform; there are many new APIs added and thousands of bug fixes and enhancements. The JDK 6 update releases are representative of update releases; the same platform specification is implemented, Java SE 6 in this case, and there are typically dozens to a few hundred bug fixes and enhancements in a release (6 update train release notes). Like update releases, maintenance releases implement the same base specification as a previous platform release, such as JDK 1.4.1 and JDK 1.4.2 both being additional implementations of J2SE 1.4, but they have more bug fixes than an update release, on the order of one thousand to two thousand changes (JDK 1.4.2 release notes). While maintenance releases have not been formally issued since JDK 1.4.2, the changes in 6u10 were more on par with a maintenance release rather than a regular update release.

The general evolution policy for APIs in the JDK is:

  1. Don't break binary compatibility (as defined in the Java Language Specification)

  2. Avoid introducing source incompatibilities

  3. Manage behavioral compatibility change

While these policies hold for all three kinds of releases, the allowable compatibility regions differ for kind of release. For update and maintenance releases, the reference point to measure compatibility against is an earlier implementation of the same platform specification, such as the initial reference implementation of the platform or an earlier update release.

Maintenance and Update Release Compatibility

Since binary incompatible changes are not allowed, the acceptable compatibility region for update and maintenance releases is confined to the (Behavioral × Source) plane, with more latitude on the behavioral axis. For update releases, a limited amount of behavioral change is acceptable, where behavioral change is broadly considered to be any observable aspect of the platform. While programs should only rely on specified interfaces, they can often accidentally rely on implementation details of the release's behavior so update releases limit the overall change in behavior. Some minor changes affecting source compatibly can occur in an update release, for example, the version of an endorsed standard or standalone technology included in the release can be upgraded. The JAX-WS component was upgraded from 2.0 to 2.1 in 6u4 (6535162). Such upgrades should generally preserve the meaning of existing programs that compile and possibly allow new programs to compile. The main compatibility effect should be that the negative compatibility region may get smaller; programs that "don't work" or "don't compile" can become programs that "work" or "compile." Maintenance release are generally similar, but more behavioral change is allowed and expected since there are a greater number of bug fixes and enhancements.

The compatibility reference point for a platform release is an implementation of the previous platform specification. Compared to the previous platform specification, a platform release can add APIs and language feature that impact source compatibility (new keywords, etc.) and the implementation can have many changes in behavior (such as changing the iteration order of HashMap). In exceptional circumstances, there is the possibility of a sliver of binary incompatibility, such as to address a security issues in a rarely-used corner of the platform, but the central policy of preserving binary compatibility holds for platform releases as well.

Platform Release Compatibility

Comparing one build of an in-progress platform release to another, there may be large changes in binary compatibility before a new API is finalized. As a matter of policy, certain kinds of source incompatibility will not be introduced into the platform anymore. For example, the Java language in JDK 7 will have no new keywords that invalidate existing sources; instead of a full new keyword, JSR 294 is making "module" a restricted keyword whose use as a keyword versus an identifier will be disambiguated by the compiler.

Previously, there was a sharp jump down in the behavioral change allowed in a platform release compared to the first update of that new platform. A more helpful policy may be to allow greater behavioral change to new features in the first few updates of a new platform so that the implementation can be improved before widespread adoption justifies greater caution in managing behavioral change.

(2009-08-17 13:16:04.0) Permalink Comments [2]

20090813 Thursday August 13, 2009

JDK 7: New Component Delivery Model in the Works

The JDK includes many logically distinct sets of APIs. Some of the APIs naturally live in the JDK and evolve at the pace of the JDK; other APIs are effectively maintained externally, but are also shipped as part of the public API provided by the JDK. Two APIs in the latter camp are jaxp and jax-ws, both of which natively live in the GlassFish project.

Currently, those components are maintained under separate version control as part of OpenJDK in the jaxp and jax-ws repositories, respectively. Code in these components is periodically synced with changes from the upstream masters, with some nontrivial overhead.

To reduce the overhead of updating the components and thereby make it possible to update them more frequently, we're in the process of changing the delivery model of these externally maintained components into the JDK. First for JDK 7 and later for OpenJDK 6, instead of tracking these code bases under independent version control in the JDK, the JDK build will logically get the source for those components from a source bundle. The upstream teams will be responsible for providing source bundles and the JDK build will be configurable to use a particular source bundle.

Kelly has been working in implementing this new model for jaxp in the JDK 7 build, including working out the detailed logistics with the upstream teams. An initial version of the change is out for review.

This new approach is the same basic model the IcedTea project uses to provide changes and functionality on top of OpenJDK so there is lots of evidence large code bases can be handled using this model.

(2009-08-13 09:00:00.0) Permalink Comments [2]

20090811 Tuesday August 11, 2009

JDK 7 Build Prepped for Language Changes

After build hacking by Jon and Kelly, the JDK 7 build now uses -source 7 and -target 7 to compile the Java code, meaning the build is prepped allow use of new language features as they become available and to take advantage of version 51.0 class file features (6854244, 6827026).

(For bootstrapping purposes, the langtools repository housing javac and friends will remain buildable with JDK 6.)

(2009-08-11 17:29:00.0) Permalink Comments [1]

20090810 Monday August 10, 2009

Reflective Operation Exceptions

Applying previously published advice on designing exception types, as of JDK 7 build 68, exceptions thrown by core reflection operations have been retrofitted to have a common superclass, ReflectiveOperationException (6857789).

Inserting a new level into the superclass hierarchy in this fashion is a binary compatible change (JLSv3 § 13.4.4 Superclasses and Superinterfaces) since the exceptions remain subclasses of original superclass java.lang.Exception. The change in direct supercass should be transparent other than to reflective operations that specifically query the superclass.

All the exception classes in question already had explicit serialVersionUID fields so changing their superclass is compatible from a serialization perspective too. If explicit serialVersionUID fields were not already present, they would need to be added since the direct superclass figures into the default serial version computation.

(2009-08-10 16:29:32.0) Permalink Comments [3]

20090806 Thursday August 06, 2009

Build Advice: Set Source, Target, and Encoding

When using javac to compile nontrivial programs, it is almost always prudent to explicitly set the source, target, and encoding options (-source, -target, and -encoding, respectively). Leading by example, the JDK 7 build was recently changed to use explicit source and target settings ahead of upgrading those settings to 7 (6854244, 6827026).

The source option picks which version of the language to accept. Note that to perform a proper cross-compile to an older version of the platform the bootclasspath also needs to be set to an appropriate library. The target option selects which class file version to use for output. The same source construct, for example, a class literal, may be compiled differently and with slightly different semantics under various source and target settings. More directly, the target setting affects which JDK versions the resulting class files will run on. The default source and target change over time; specifying these options explicitly rather than relying on the defaults requests and documents the desired semantics the compiler should use for the input sources and output class files.

The encoding option controls the initial mapping of bytes from the physical file into a raw stream of Unicode characters comprising the source file. (Further translations can occur on the raw stream of Unicode characters before the logical stream of tokens is constructed.) If not set explicitly, the platform's default encoding is used to perform the initial mapping. The default encoding for a platform is stored in the file.encoding system property. Encoding errors can cause contents of a file to be interpreted as gibberish and may lead to a compiler error.

One kind of program where using the default source and target is reasonable is regression tests for the compiler and related tools. Unnecessary source and target options were removed from the JDK 7 langtools repository (6843761) as part of the overall effort to update the default source and target to 7 . Many of the now extraneous options in those regression tests were introduced in JDK 5 before the default source was switched to 1.5 in that release. Switching the default source setting early in a feature release avoids the need to have explicit options in tests of new language features.

(2009-08-06 09:30:00.0) Permalink

20090805 Wednesday August 05, 2009

Source, target, class file version decoder ring

Correct usage of javac's -source and -target settings is important to generate class files with the required properties. As Alex has written recently, the compiler's source and target settings interact with various other versioned aspects of the platform. The primary effect of the -source option is to select which version of the Java Programming Language to accept and the primary effect of the -target option is to specify the class file version to output. The first table below summarizes the default source and target settings used by javac in different JDK releases when no explicit -source or -target options are specified. As explained Cross-Compilation Options section of the javc man page, the default target varies depending on the source setting.

In the beginning and through the JDK 1.1 series, javac only had single (implicit) source and single (implicit) target setting. One feature of how inner classes were added in 1.1 was that the class file format did not need to be changed. Therefore, 1.1 and 1.0 share the same target setting. Other subsequent language changes have required later class file versions to be used, sometimes only to force the availability of platform features rather than to intrinsically use new features of the class file format itself. In JDK 1.2, the strictfp modifier was added to the language and to the new class file format selected with target 1.2. The javac in JDK 1.2 always accepts the new keyword, but only generates the new class files supporting the feature if explicitly requested with the -target 1.2 option. Likewise, the JDK 1.3 releases only accept a single implicit source level, unchanged from 1.2, but accept targets ranging from 1.1 to 1.3, with a default of 1.1.

The 1.4 source version added an assert keyword so there was a practical need to be able to compile existing code that happened to use "assert" as an identifier in addition to the new uses of the assert language construct. Therefore, an explicit -source flag was added in JDK 1.4 and retained in subsequent JDK releases. In JDKs 5 and 6, new source and target settings were added. The 1.5 and 1.6 target settings correspond to distinct class file versions; the 1.5 and 1.6 source settings are nearly identical. As an operational difference, javac handles encoding problems differently under those two source levels; an encoding problem generates a warning with source 1.5 but is treated as an error with source 1.6. There is a small semantic difference between the handling of the @Override annotation in the two source levels, but fully elaborating the complications around @Override, source settings, and JDK versions is a story for another day.

The default source and target javac applies in JDK 7 were recently upgraded to source and target 7; previously the same defaults as in JDK 6 were used.

Default javac Source and Target Settings

JDK 1.4.0 and 1.4.1 only accepted source 1.3 and 1.4.

JDK/J2SDK Default Source Source Range Default Target Target Range
1.0.x 1.0 1.1
1.1.x 1.1 1.1
1.2.x 1.2 1.1 1.1 - 1.2
1.3.x 1.2/1.3 1.1 1.1 - 1.3
1.4.x 1.2/1.3 1.2 - 1.4 1.2 1.1 - 1.4
5 1.5 1.2 - 1.5 1.5 1.1 - 1.5
6 1.5 1.2 - 1.6 1.6 1.1 - 1.6
7 1.7 1.2 - 1.7 1.7 1.1 - 1.7

Given the same source code, compilers from different releases configured to use the same source and same target (and same bootclasspath!) can still generate different class files because of bug fixes or changes to compiler-internal contracts. An example of a bug fix, javac -target 1.2 in JDK 1.4.2 elides Miranda methods, extra methods synthesized by the compiler to work around early JVM bugs calling interface methods. Evidence of Miranda methods and other past sins are recorded in javac's sun.tools.java.jvm.Target class. One compiler-internal contract that has evolved over time is the idiom used to access private members of an enclosing class.

While often updated in small ways, the fundamental structure of class files has been very stable across many releases. The first new bytecode, invokedynamic, is being added in JDK 7; although the pair of jsr/ret bytecodes was effectively removed as of target 1.6. The most common way to add capabilities to the class file format has been the addition of new predefined class file attributes; see table 4.6 in the draft of the Java VM Specification, Third Edition for more information. The table below shows the mapping of javac target settings to class file major.minor version numbers.

Mapping of Targets to Class file major.minor numbers

Target Major.minor Description
1.1 45.3

The original shipped version.

1.2 46.0

Supports the strictfp modifier.

1.3 47.0

Small update.

1.4 48.0

Small update.

5 (1.5) 49.0

New attributes to support generics and other features.
Many more strings accepted as legal identifiers.

6 (1.6) 50.0

StackMaps are supported.

7 (1.7) 51.0

invokedynamic is supported.

(2009-08-05 19:14:59.0) Permalink

20090721 Tuesday July 21, 2009

Pick a path, any path

As heard at JavaOne this year, classpath is dead, being killed by the modularity features in JDK 7, JSR 294 and Project Jigsaw.

Today the full logical classpath available to an application or used in javac is actually a twisty little maze of concatenated sub-paths, starting with the bootclasspath, followed by the extension directories, and finally the classpath setting itself. The endorsed standards override mechanism is used to selectively update various components logically included on the bootclasspath as part of the JDK, either components for standards that evolve outside of the JCP, like Corba, or standalone JSRs also shipped with the JDK, like JAX-WS. The extension mechanism can be used to support technologies not shipped with the JDK, such as an independent JSR or even a site-wide library.

The table below shows the different command line options to java and javac that can be used to configure the sub-paths. These options have evolved over time. The bootclasspath and extension directories were added in JDK 1.2, the ability to prepend to the bootclasspath was added in JDK 1.3, endorsed standards were added in JDK 1.4, JDK 5 harmonized the path options on the java and javac command lines, and JDK 6 allowed classpath wildcards, but only for the strict classpath component and not for bootclasspath or the Class-Path Jar manifest attribute. Check the documentation for the JDK release in question for matching configuration information.

There is certainly plenty of opportunity for modularity in JDK 7 to simplify the configuration options needed to resolve dependencies!

Path Setting in java and javac
bootclasspath/p: Endorsed standards bootclasspath[/a:] Extension Directories classpath
java options -Xbootclasspath/p: -Djava.endorsed.dirs= -Xbootclasspath:
-Xbootclasspath:/a
-Djava.ext.dirs= -cp
-classpath
$CLASSPATH
javac options -Xbootclasspath/p: -Djava.endorsed.dirs= -bootclasspath
-Xbootclasspath:
-Xbootclasspath:/a
-extdirs
-Djava.ext.dirs=
-cp
-classpath
$CLASSPATH
Classloaders
bootstrap
extension
application

(2009-07-21 17:58:59.0) Permalink Comments [9]

20090716 Thursday July 16, 2009

Project Coin: Literal Grammar Hackery

Correction: External to this blog, it was been pointed out to me that the original grammar disallowed two-digit numbers, which is unintended. The fix is to make the DigitsAndUnderscores component in Digit DigitsAndUnderscores Digit optional, as done in the corrected grammar below

Circling back to look at some unresolved technical details of the underscores in numbers proposal, I wrote up a combined grammar to allow binary literals as well as underscores as separators between digits. That is, underscores cannot appear as the first or last character in a sequence of digits.

The basic grammar change is to convert the definition of Digits (in any base) from the simple left recursive list of digits found in JLSv3, like

Digits:
Digit
Digits Digit

to a list where underscores can appear between numbers but the list must start and end with a digit:

Digits:
Digit
Digit DigitsAndUnderscoresopt Digit
DigitsAndUnderscores:
DigitOrUnderscore
DigitsAndUnderscores DigitOrUnderscore

This grammar is unambiguous, but as written it requires a look ahead of more than 1 because the recursion is in the middle of the Digits production. I have not attempted any of the usual grammar refactorings to restore a look ahead of 1 since in practice purging the underscores will be implemented by a small amount of additional logic in the scanner as opposed to the actual parsing machinery.

The existing rules for distinguishing decimal and octal literals cause minor grammar complications to accommodate underscores immediately after the first digit. Octal numbers must start with a leading zero digit and nonzero decimal numbers must start with a nonzero digit, requirements reflected in rules like NonZeroDigit Digitsopt. To allow underscores after the first digit, a new rule requiring at least one underscore is added, such as NonZeroDigit Underscores Digits. The structure of binary literals is straightforward and entirely analogous to hexadecimal ones. Changing the digit-level productions automatically allows underscores in floating-point literals without the need to explicitly update the rules for those literals.

Productions in blue below are additional or changed productions to existing non-terminals; the other non-terminals below are newly introduced to support the enhanced literal syntax.

IntegerLiteral:
DecimalIntegerLiteral
HexIntegerLiteral
OctalIntegerLiteral
BinaryIntegerLiteral
BinaryIntegerLiteral:
BinaryNumeral IntegerTypeSuffixopt
BinaryNumeral:
0 b BinaryDigits
0 B BinaryDigits
DecimalNumeral:
0
NonZeroDigit Digitsopt
NonZeroDigit Underscores Digits
Underscores:
_
Underscores _
Digits:
Digit
Digit DigitsAndUnderscoresopt Digit
DigitsAndUnderscores:
DigitOrUnderscore
DigitsAndUnderscores DigitOrUnderscore
DigitOrUnderscore:
Digit
_
HexDigits:
HexDigit
HexDigit HexDigitsAndUnderscoresopt HexDigit
HexDigitsAndUnderscores:
HexDigitOrUnderscore
HexDigitsAndUnderscores HexDigitOrUnderscore
HexDigitOrUnderscore:
HexDigit
_
OctalNumeral:
0 OctalDigits
0 Underscores OctalDigits
OctalDigits:
OctalDigit
OctalDigit OctalDigitsAndUnderscoresopt OctalDigit
OctalDigitsAndUnderscores:
OctalDigitOrUnderscore
OctalDigitsAndUnderscores OctalDigitOrUnderscore
OctalDigitOrUnderscore:
OctalDigit
_
BinaryDigits:
BinaryDigit
BinaryDigit BinaryDigitsAndUnderscoresopt BinaryDigit
BinaryDigitsAndUnderscores:
BinaryDigitOrUnderscore
BinaryDigitsAndUnderscores BinaryDigitOrUnderscore
BinaryDigitOrUnderscore:
BinaryDigit
_
BinaryDigit: one of
0 1
(2009-07-16 16:28:41.0) Permalink Comments [2]

20090714 Tuesday July 14, 2009

Deprecation in the JDK

A quick note on the deprecation policy used in the JDK, a question which comes up from time to time. The general policy for several feature releases is that core JDK components are only marked as deprecated if they are actively harmful. If using a class or method is just ill-advised, that is usually not sufficient to earn the deprecated mark.

The platform javadoc falls short of deprecating, but does discourage the use of certain API elements, from particular methods, like the no-arg Boolean constructor, to entire classes, like Vector and Hashtable. At some point, this kind of advice might be formalized with a less-harmful-than-deprecated "denigration" facility based a combination of javadoc tags and annotations to allow programmatic checks be made for usage of these less harmful API elements too (4941777, 6583872).

When an API element is deprecated, the recommended practice is to both apply the @Deprecated annotation as well as use the "@deprecated" javadoc tag. Using the annotation places more of the semantics of the code in the source code proper as opposed to a comment while using the javadoc tag allows alternate functionality to be recommended along with the specification for the element.

(2009-07-14 16:23:30.0) Permalink Comments [7]

Calendar

« November 2009
SunMonTueWedThuFriSat
1
2
3
5
6
7
8
9
10
11
12
13
14
15
16
17
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today

RSS Feeds

XML
All
/Annotation Processing
/General
/Java
/JavaOne
/Numerics
/OpenJDK

Search

Links

    Blogroll
  • Download the JRE

    News

Navigation



Referers

Today's Page Hits: 521