Jonathan Gibbons

Monday Aug 11, 2008

Improving javac diagnostics: update

Back in May, I talked about some ideas for improving javac diagnostics. Maurizio has now started making those ideas a reality. See his own recent blog entry for an update.

Sunday Jul 06, 2008

Towards a multi-threaded javac

I just finished a vacation with my family, and in the in-between times, I made significant progress towards a multi-threaded javac.

Before you get too excited, let me qualify that by saying that there is some low-hanging fruit for this task, and there's a complete rewrite of the compiler. I'm only talking about the former; there are no plans to do the latter.

The difficulty with a multi-threaded javac is that the Java language is quite complicated these days, and as a result the compiler is also quite complicated internally, to cope with all the consequences of interdependent source files. The current compiler is not set up for concurrent operation, and adapting it would be error-prone and destabilizing. (For more details on the compiler's operation, see Compilation Overview on the OpenJDK Compiler Group web pages.)

The low hanging fruit comes by considering the compilation pipeline in three segments: input, process, and output. The source files can all be read and parsed in parallel, because there are no interdependencies there. (Well, almost none. More on that later.) Likewise, once a class declaration has the contents of its class file prepared internally, it can be written out in the background while the compiler begins to work on the next class.

We've known about this low hanging fruit for a while, and Tom Ball recently submitted a patch with a code for parsing source files in parallel. So, faced with a family vacation, I loaded up my laptop with the bits needed to explore this further.

Parallel parsing

If you're parsing files in parallel, the primary conflict is access to the Log, the main class used by the rest of the compiler to generate diagnostics. It is reasonably obvious that even if you're parsing the source files in parallel together, you don't want to see interleaved any diagnostics that might be generated: you want to see all the diagnostics for each file grouped together. Initially, I was thinking to create custom code in the parser to group parser diagnostics together, but since the scanner (lexer) can also generate diagnostics, it seemed better and less intrusive to give each thread its own custom Log that could save up diagnostics until they can all be reported together. The previous work on the Log class made it somewhat "hourglass shaped", with a bunch of methods which are used by the rest of the compiler to create and report diagnostics, and a back end that knows how to present the diagnostics that are generated. In between is a single "report" method, which was originally introduced to make it easy to subtype Log to vary the way that diagnostics are presented. Now, however, that method provided an excellent place to divide Log in two, into an abstract BasicLog, that provides the front end API used by the body of the compiler, and subtypes to handle the diagnostics that are reported. The main compiler uses Log it always did -- one of the Big Rules for the compiler is to minimize change -- but the threads for the new parser front end can now use a new subtype of BasicLog that buffers up diagnostics and reports them together when the work of the thread is complete.

This refactoring forced one other cleanup in Log, which was an ugly hangover from the introduction of JSR 199, the Compiler API. The Diagnostic objects that get created had an ugly hidden reference bug to the Log that created them, which if used incorrectly could provoke NullPointerExceptions or other problems if you tried to access the source code containing the diagnostic. For those that are interested, it's because of interaction with the Log.useSource method which sets temporary state in Log, but the bottom line is that one more refactoring later, the DiagnosticSource interface became a much better DiagnosticSource object, providing a much cleaner standalone abstraction for information about the source code containing the location of the diagnostic.

(Log used to be one big do-everything class; it has slowly been getting better over the years, and watch out for the upcoming exciting new work that Maurizio is doing to improve the quality and presentation of diagnostics. Luckily, these refactorings I'm describing here will not interfere with that work too much.)

There are some other shared resources used by the Parser: most notably, the compiler's name table, but these were easily fixed by synchronizing a few strategic methods.

That, then was sufficient for the first goal — to parse source files in parallel. :-)   Writing the class files concurrently was somewhat more interesting.

Background class generation and writing

Apart from the general refactoring for Log, the work to parse source files in parallel turned out to be very localized, almost to a single method in JavaCompiler, which is responsible for parsing all the source files on the command line. That one method can choose whether to parse the source files sequentially, as before, or in parallel. There is no such easy method for writing out class files. This is because the internal representation of a generated class file may be quite large, and the compiler pipeline is normally set up to write out class files as soon as possible, and to reclaim the resources used. Because of the memory issues and the primarily serial nature of the upstream pipeline, the general goal was not to write all the class files in parallel, but merely to be able to do the file IO in the background. Thus the design goal was to write classes using a single background thread fed by a limited capacity blocking queue, and so improving the flexibility of the compiler pipeline would improve the ability to write out class files in the background. In particular, it was also desirable to fix an outstanding bug such that either all the classes in a source file should be generated, or none should. The current behavior of generating classes for the contents of a source file until any errors are detected does not fit well with simple build systems like make and Ant that use simple date stamps to determine if the compiled contents of a class file are up to date with respect to the source file itself.

There were already some ideas for reorganizing the compiler pipeline within the main JavaCompiler class. Previously, a big "compile" method in JavaCompiler had been broken up into methods attribute, flow, desugar and generate, representing the different stages of processing for each class to be compiled. These methods could be composed in various ways depending on the compilation policy, which is an internal control within the compiler. The methods communicated via lists of work to be processed, and although the concept was good, it never paid off quite as well as anticipated because of the memory required to build all of the items on the lists before handing the list to the next stage. The latest idea that had been developing was to use iterators or queues to connect the compilation phases, rather than lists.

Another refactoring later, it turned out that queues were the way to go (as in java.util.Queue), because they fit the abstraction required and caused less change elsewhere in the compiler.

In a related improvement, the main "to do" list was also updated. Previously, it was just a simple list of items to be processed, using a simple javac ListBuffer. It was updated to implement Queue, and more importantly, to provide additional access to the contents grouped according to the original source file. This made it easier to process all the classes for a source file together, including any anonymous inner classes. Previously, anonymous inner classes were handled much later than their enclosing classes, because while top level and nested classes are discovered and put on the "to do" list very early, anonymous inner classes are not discovered until much later.

However, an earlier bug fix got in the way of being able to effectively complete processing the contents of a single source file all together.

Normally, the compiler uses a very lazy approach to the overall compilation strategy, advancing the processing of each class as needed, with a "to do" list to make sure that everything that needs to be done eventually really does get done. However, limitations in the pipeline precluded that approach in the desugaring phase. If the supertypes of a class are being compiled in the same compilation as the subtype, they need to be analyzed before the subtype gets desugared, because desugaring is somewhat destructive. The previous implementation could not do on demand processing of the supertypes, so instead the work on the subtypes was deferred by putting them back on the "to do" list to be processed later, after any supertypes had been processed, thus defeating any attempt to process these files together. The new, better implementation is simply to advance the processing of the supertypes as needed.

All this refactoring was somewhat easier to implement than to describe here, and again per the Big Rules, the work was reasonably localized to the JavaCompiler and ToDo classes, with little or no changes to the main body of the compiler. The net result is more flexibility in the compiler pipeline, with a better implementation of the code to generate code file by file, rather than class by class. And, to bring the story back to the original goal, it makes it easier adapt the final method of the pipeline so that it was do its work serially, or with a background queue for writing class files in the background. :-)

And now ...

So where is this work now? Right now, it's here on my laptop with me in a plane somewhere between Iceland and Greenland, so let's hope for a safe journey the rest of the way back to California. The work needs some cleaning up, and more testing, on more varied machines. I've been running all the compiler regression tests and building OpenJDK with this new compiler, and it looks good so far. Finally, it will need to be code reviewed, and pushed into the OpenJDK repositories, probably as a series on smaller changesets, rather than one big one. So watch out for this work coming soon to a repository near you ...

Wednesday May 07, 2008

Improving javac diagnostics

In javac land, we're looking at improving the diagnostic messages generated by the compiler ...

Here's one of my anti-favorite messages that I got from the compiler recently:

/w/jjg/work/jsr294/7/langtools/test/tools/javac/superpackages/CompileErrorTest.java:90: getTask(java.io.Writer, javax.tools.JavaFileManager, javax.tools.DiagnosticListener<? super javax.tools.JavaFileObject>, java.lang.Iterable<java.lang.String>, java.lang.Iterable<java.lang.String>, java.lang.Iterable<? extends javax.tools.JavaFileObject>) in javax.tools.JavaCompiler cannot be applied to (<nulltype>, javax.tools.StandardJavaFileManager, <nulltype>, java.util.List<java.util.List<java.lang.String>>, <nulltype>, java.lang.Iterable<capture#81 of ? extends javax.tools.JavaFileObject>) JavaCompiler.CompilationTask task1 = c.getTask(

It is my pet peeve message type ("can't apply method...") and also includes wildcards, captured type variables, and a <nulltype>. The text of the error message, excluding source file name and highlighted lines is a whopping 577 characters :-) Who says we don't need to improve this?

We have various ideas in mind. This first list is about the content and form of the messages generated by javac.

  • Omit package names from types when the package is clear from the context. For example, use Object instead of java.lang.Object, String instead of java.lang.String, etc, when those are the only classes named Object, String etc, in the context.

  • Don't embed long signatures in message. Instead, restructure the message into a shorter summary plus supporting information.

    For example,

           Method name cannot be applied to given types
           required: types
           found: types
    
  • Don't embed captured and similar types in signatures, since they inject wordy non-Java constructions into the context of a Java signature. Instead, use short placeholders, and a key.

    For example, in the message above, replace
      java.lang.Iterable<capture#81 of ? extends javax.tools.JavaFileObject>
    by
       java.lang.Iterable<#1>
    with a note following the rest of the message:
       where #1 is a capture of ? extends javax.tools.JavaFileObject

Put these suggestions together, and the message above becomes:

/w/jjg/work/jsr294/7/langtools/test/tools/javac/superpackages/CompileErrorTest.java:90: method JavaxCompiler.getTask cannot be applied to given types
  required: (Writer, FileManager, DiagnosticListener<? super JavaFileObject>, Iterable<String>, Iterable<? extends JavaFileObject>)
  found: (#null, StandardFileManager, #null, List<List<String>>, #null, Iterable<#1>)
  where:
    #1 is the capture of ? extends JavaFileObject

It is a lot shorter (less than half the length of the original message, if you're counting), and more importantly, it breaks the message down into segments that are easier to read and understand, one at a time. It still has a long file name in it, and I'll address that below.

The following ideas are more about the presentation of messages. javac is typically used in two different ways: batch mode (output to a console), and within an IDE, where the messages might be presented as "popup" messages near the point of failure, and in a log window within the IDE.

When used in batch mode, either directly from the command line or in a build system, the compiler could allow the user to control the verbosity of the diagnostics. If you're compiling someone else's library, you might not be worried about the details in any warnings that might be generated. If you're compiling your own code, you might be comfortable with a quick summary of each diagnostic, or you might want as much detail as possible.

When used in an IDE, it would be good to provide the IDE with more access to the component elements of the diagnostic, so that the IDE could improve the presentation of the message. For example,

  • display the base name of the file containing the error, and link it to the compilation unit, instead of displaying the full file name as above
  • use different fonts for the message text and the Java code or signatures contained within it
  • hyperlink types used in the diagnostic to their declaration in the source code
  • given the resource key for the message, an IDE could use the key as an index into additional documentation specific to the type of the error message, explaining the possible causes for the error, and more importantly, what might be done to fix the problem.
To support these suggestions, the compiler could be instructed to generate diagnostics in XML, so that they could be "pretty-printed" in the IDE log window.

Here's how these ideas could be used to improve the presentation of the example message.

CompileErrorTest.java:90: method JavaxCompiler.getTask cannot be applied to given types    [More info...]
  required: (Writer, FileManager, DiagnosticListener<? super JavaFileObject>, Iterable<String>, Iterable<? extends JavaFileObject>)
  found: (#null, StandardFileManager, #null, List<List<String>>, #null, Iterable<#1>)
  where:
    #1 is a capture of ? extends JavaFileObject    [More info...]

OK, I'll leave the real presentation design to the UI experts, but I hope you get the idea of the sort of improvements that might be possible.

Finally, we'll be looking at improving the focus of error messages. For example, this means that if the compiler can determine which of the arguments is at fault in a particular invocation, it should give a message about that particular argument, instead of about the invocation as a whole. However, care must also be taken not to narrow the focus of an error message incorrectly, so that the message becomes misleading. A typical example of that is when the compiler is parsing source code, and having determined that the next token is not one of A or B, it then checks C. If that is not found the compiler may then report "C expected", when a better message would have be "A, B or C expected." This means that such optimizations have to be studied carefully on a case by case basis, whereas all of the preceding suggestions can be applied more generally to all diagnostics.

So, do you have any "pet peeve" messages you get from the compiler? Do you have suggestions on how the messages could be improved, or how they get presented? Add a comment here, or mail your suggestions to the OpenJDK compiler group mailing list, compiler-dev at openjdk.java.net.

Thanks to Maurizio and others for contributing some of the suggestions here.

See Also

Tuesday May 06, 2008

Evolving KSL

As some of you may know, we've made changes recently to the KSL project that was started last year.

We thought it would be fun to share some of the ideas we had along the way.

For inspiration, we decided to throw a bunch of ideas into a kitchen sink, for real, to see what that might inspire. See how many of your favorite language ideas are represented here.

Hint: There are no wrong answers, but there have been some creative ones. :-)

Of course, as soon as we did that, this little guy wanted to get in on the act. He may not quite understand the gist of KSL, but you can't fault his enthusiasm: there's a keyboard, a couple of mice, a couple of monitors and even a KVM cable in the sink, if you look carefully!

For information about what we finally came up with, see the Compiler Group KSL page.

(Thanks to Alex for helping with the first photo, and to Greg Quayle for creating the sculpture in the second.)

Monday May 05, 2008

jtharness vs jtreg

Now that the source code for the OpenJDK Regression Test Harness (jtreg) is available, this provides an overview of how jtreg relates to JT Harness.

In the beginning, there was a need to run compatibility tests on all Java plaforms, and thus was born the "Java Compatibility Kit" (JCK), and the test harness to run those tests (JavaTest). Not long after, there was a need to run a different type of test, regression tests, for JDK. They are fundamentally a different style of test (think white box and black box), but rather than write another, different harness, we adapted JavaTest, and thus was born "jtreg". Eventually, we needed to be able to run tests on different types of platforms, such as Personal Java, a precursor to the J2ME platform. JavaTest evolved to fill the need, and gave rise to the basic architecture you see today.

As you can see, JavaTest, now open sourced under the name JT Harness, is composed of a number of distinct components

  • The core harness. This provides the general ability to represent tests, test results, and to sequence through the execution of a series of tests.
  • Utilities. These provide support to the other JT Harness components.
  • The user interface. JT Harness provides a rich graphical user interface, as well as a command line interface.

And, deliberately, there's a hole. By itself, JT Harness cannot directly run tests in any test suite. It needs to be given a plugin for each test suite, that define how to locate and read tests, how to configure and execute tests, and other additional information, often used by the GUI, to provide documentation specific to the test suite. The plugin code is not complicated and is often a direct use or simple extension of the utility classes provided by JT Harness. Note: most test suites do not need to use all of the JT Harness utility classes, which explains why JT Harness may have more build-time dependencies than are required to build and run any individual test suite.

Thus, the complete picture is more as follows:

This is the form used for most test suites, such as JCK and other TCKs. The code to plugin to the harness is bundled in the test suite, along with a copy of JavaTest or JT Harness itself. These test suites all leverage the user interfaces provided by the harness, both the CLI and GUI. When the user runs JT Harness, and tells it to open a test suite, the harness looks inside the test suite for details of the plugin to be used in conjunction with that test suite.

Because jtreg evolved from JavaTest early on, and specifically, before the JavaTest user interfaces evolved to what you see today, jtreg has a slightly different architecture.

jtreg still utilizes the harness' plugin architecture to configure how to read and execute the tests in the test suite, but it provides its own custom command line interface for setting up a test run. The jtreg CLI provides options tailored to running the JDK regression test suite. It can also invoke the JT Harness GUI, but this is just for monitoring a test run, and does not (yet) provide much support for reconfiguring a test run with different options from within the GUI, as you can do for other test suites.

jtreg is also different from other test suites in that jtreg is distributed separately from the tests that it gets to execute. TCKs are generally built and distributed as self-contained products in their own right, whereas the JDK regression tests do not get prebuilt: they exist and are executed in their source form in each JDK developer's repository. Per the The JDK Test Framework: Tag Language Specification, the tests are compiled and executed as needed.

For more information, see ...

Thursday May 01, 2008

OpenJDK Regression Test Harness (jtreg) - open source

The OpenJDK Regression Test Harness, also known as "jtreg", is now available with an open source license.[Read More]

Tuesday Mar 04, 2008

M&Ms: Milestones and Members

M&M's are great, so here's a few with no calories whatever.[Read More]

Friday May 18, 2007

The King is dead. Long live the King!

Well, the phrase may not be entirely apt, but it does carry the right sentiment, especially to the monarchists amongst us.

As you may have read, Peter is moving on to new adventures, but you can be sure that there are still plenty of adventures still to be had with javac, and the rest of us on the javac team are looking forward to carrying on with everything that is coming up.

With the compiler being open sourced last year, and the recent announcements regarding OpenJDK, and JDK 7 ahead, these are exciting times, albeit somewhat turbulent at times. So, stay tuned for more details, and in the meantime, thanks go to Peter for all his contributions.

Monday May 14, 2007

Unit Testing Wisdom From An Ancient Software Start-up

JavaOne contained many gems of wisdom, of varying sizes. This one indirectly came my way during the week, and is fun enough to pass along. It is self-described as "Good advice on developer and unit testing, packaged as twelve cryptic bits of ancient Eastern wisdom." You can find it here. It is also available in PDF. Enjoy -- and thanks to Alberto Savoia for making it available.

Friday Nov 17, 2006

Office Tango

Tom Lehrer sang a song called "The Masochism Tango". Here at Sun, we call it "Office Moves".[Read More]

Thursday Nov 16, 2006

com.sun.javatest.*: JT Harness API

JavaTest provides some useful APIs for specialized usage ...[Read More]

Wednesday Nov 15, 2006

javatest.png: JavaTest in pictures

A look at the evolution of the JavaTest harness, over the years.[Read More]

Tuesday Nov 14, 2006

Way to go, JT Harness

A brief background overview of the harness underlying jtreg: the JavaTest harness, which is now available as JT Harness. [Read More]

Monday Nov 13, 2006

wwwww.jtreg: The Who What Where When Why of jtreg

A brief background overview of jtreg, the JDK regression test harness. [Read More]

Wednesday Oct 04, 2006

Hello World

Who am I?[Read More]

Calendar

Feeds

Search

Links

Navigation

Referrers