Joseph D. Darcy's Sun Weblog

Joseph D. Darcy's Sun Weblog


20070612 Tuesday June 12, 2007

Balance of Error

Given limited resources, optimizing quality doesn't just involve minimizing the number of errors; it also involves balancing different kinds of errors. There are two basic types of errors one can make:

  1. Doing things you should not do.

  2. Not doing things you should do.

For the theologically inclined these would be sins of commission and sins of omission, respectively. Much of statistics deals with bounding the probability of making errors in judgments. From statistics, except in very specialized circumstances, you can't simultaneously control for both type I and type II errors. Therefore, generally the worse kind of error is phrased as a type I error and then the machinery of statistics is applied to the problem in that form; p-values are a bound of the probability making a type I error. However, that doesn't imply that the other type of error is unimportant or should be ignored. A well-publicized example of the need to balance both kinds of errors has occurred in the FDA's drug approval process. A pharmaceutical company must demonstrate safety and efficacy of a new drug before it goes to market. The FDA is primarily concerned with preventing the type I error of releasing of an unsafe drug to the public. However, the type II error of keeping useful drugs off the market can also be problematic and was raised as an issue during the early AIDS crisis.

In a software release, a type I error would be putting back a bad fix which introduces a bug and a type II error would be not fixing an issue that should be addressed. The more time that is spent verifying a fix is good in various ways (code review, testing, more code reviews, more testing), the less time that is available to address other issues. Overall, neither extreme of focusing only on reducing type I errors nor of focusing only on reducing type II errors leads to a global quality optimum for a given amount of resources. Qualitatively, for a given total number of errors the relationship between quality and the ratio of errors is a roughly bell-shaped curve:
Bell-shaped trade off between type I and type II errors

Toward the left side of the curve, type I errors are the dominant cause of reducing quality. As oversight is increased, the type I error rate is reduced and quality increases. However, the amount quality improves for each additional unit of oversight decreases as more oversight is added. Eventually, if enough time is spent reviewing each fix, the marginal change is quality is negative because those resources would have been better directed at producing other fixes. As illustrated in the right half of the graph, as fewer and fewer changes are made, while type I errors are very few, type II errors are numerous and total quality suffers.

Therefore, the mere absence of type I errors does not imply a high-quality release because the release could be fraught with type II errors from missing functionality. An added challenge is that recognizing that a type II error has been made is often much harder than recognizing a type I error occurred since the consequences of a type I error may be seen immediately (e.g. the build breaks) while evidence for a II error may only accumulate over time in the form of an escalation or as diffusely lowered perceived quality or utility.

While not having any defects of either kind is a laudable goal, it is usually not achievable because of the high costs involved. The Mythical Man-Month (TMMM) suggests there it is nearly an order of magnitude more expensive to deliver a mature "programming systems product" compared to just a working "program." Additionally, rather than scaling linearly, the cost of software seems to go up as the amount of code raised to the 1.5 power so larger projects cost disproportionately more. Adding resources can certainly improve quality, but only adding resources without adjusting processes might not be a very efficient means toward that end. A well-balanced low resource project could achieve better quality than a poorly-balanced high resource project.

In the graph above, the relative impact of type I and II errors is symmetrical. However, a project could be more sensitive to one kind of error or the other. For example, a young software project may be judged as being more sensitive to type II errors from missing functionality, such as during a beta release, while a mature project will be less tolerant toward the introduction of type I problems. TMMM summarizes an OS study that found over time repairs to the system become more and more likely to introduce as large a flaw as was resolved; the probability of making type I errors increased with system age.

Differing Error Sensitivities

Since the green line peaks before the balanced one, that corresponds to a project which is more sensitive to type II errors than type I errors. Conversely, the blue graph is more sensitive to type I errors so it peaks after the balanced line.

Assume that to a first approximation engineers work to maximize their contribution to a software release; therefore the process costs will shape what an engineer tries to get done (and along with the error rates of the processes) will affect the overall error ratio. Balancing the processes can alter both the natural error ratio and efficiency of engineering. Two factors which can help manage a project more effectively are:

  • Estimating the shape of a project's error sensitivity graph

  • Determining in which region of that graph the project is being run

Recognizing the different sensitivities helps shape the project's goals. Next, determining where the project is running should guide process changes to improve quality. If there are too many type I errors, more stringent processes should be instituted to catch problems earlier. If there are too many type II errors, the processes should be streamlined to allow more changes to be implemented.

While identifying operating in either extreme of the graph should be uncontroversial, finding the maximum is hard, especially since the error rates are difficult to measure. Some notions from numerical optimization may aid in this search and will be discussed in future blog entries.

Thanks to Alex for feedback on earlier drafts of this entry.

(2007-06-12 16:48:32.0) Permalink

20070611 Monday June 11, 2007

Relative Ordering of Java and C++

Recently, Alex Miller has made the incendiary suggestion that C++ be renamed Java--. Years ago, Bill Joy's initial reaction to C++ was that he instead wanted "C++++-=, a little bit more but a whole lot less." Java came about a few years later.

I'm sure others have noted the Oak seedling on the cover of Stroustrup's The Design and Evolution of C++ along with the following quote on page 207:

Within C++, there is a much smaller and cleaner language struggling to get out.

However, Stroustrup says neither Java nor C# are that language.

(2007-06-11 16:39:14.0) Permalink

20070605 Tuesday June 05, 2007

Nested, Inner, Member, and Top-Level Classes

One way declared types in Java differ from one another is whether the type is a class (which includes enums) or an interface (which includes annotation types). An independent property of a type is its relation to the surrounding lexical context. A top level class does not appear inside another class or interface. If a type is not top level it is nested. However, there are a number of distinct ways a type can be nested. First, a type can be a member of another type; a member type is directly enclosed by another type declaration. All member types have names; however, some member types are inner classes and others are not. If a type is explicitly or implicitly static, it is not an inner class; all member interfaces are implicitly static.

Inner classes also include local classes, which are named classes declared inside of a block like a method or constructor body, and anonymous classes, which are unnamed classes whose instances are created in expressions and statements. Anonymous classes are used to implement specialized enum constants. Inner classes have access to instance variables of any lexically enclosing instance; that is, the fields of the object an inner class is created in reference to. However, not all inner classes have enclosing instances; inner classes in static contexts, like an anonymous class used in a static initializer block, do not.

The Venn diagram below shows how these distinctions relate and combine; in particular, member-ness and inner-ness are not orthogonal properties.

Venn Diagram of Class Nesting Kinds

pdf of diagram

A reflective API providing a complete model of the language needs to allow these differences to be determined. Two reflective APIs providing this information are core reflection as of JDK 5 and javax.lang.model from JSR 269 in JDK 6; however, each API exposes the data differently. (The legacy apt mirror API finesses the issue by not modeling local and anonymous classes.)

Core reflection uses java.lang.Class to model types. When inner classes were introduced back in JDK 1.1, a getDeclaringClass method was added to Class. While this supports member types, relevant information about local and anonymous classes was not directly available. To remedy this, JDK 5 added a number of methods to return the enclosing entity, if any, and identify what kind of nesting a type may have:

Because of the lack of an usable supertype, the different kinds of enclosing elements (classes, methods, and constructors) must be retrieved from different methods. If the class is not so enclosed, a null is returned. Therefore the code to find the enclosing element is a sequence of if-methodA-not-equal-null-else-if-methodB-not-equal-null tests.

Starting with a clean slate, javax.lang.model was able to provide a cleaner way of modeling these distinctions. First, the functionality of getDeclaringClass and the three getEnclosingFoo methods is provided by the single getEnclosingElement method, which returns the immediately lexically enclosing element regardless of whether it is a class, or method, or constructor. Second, for types the getNestingKind method returns one of the NestingKind enum constants, ANONYMOUS, LOCAL, MEMBER, or TOP_LEVEL. Those constants clearly correspond to the possible alternatives displayed in the diagram above. While it would be technically possible to add a getNestingKind method to Class, that would create an undesired dependency of a java.lang.* class on a javax.* package.

(2007-06-05 17:11:35.0) Permalink Comments [7]

Calendar

« June 2007 »
SunMonTueWedThuFriSat
     
1
2
3
4
6
7
8
9
10
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
       
Today

RSS Feeds

XML
All
/Annotation Processing
/General
/Java
/JavaOne
/Numerics
/OpenJDK

Search

Links

    Blogroll
  • Download the JRE

    News

Navigation



Referers

Today's Page Hits: 830