Monday April 21, 2008
Joseph D. Darcy's Sun WeblogJoseph D. Darcy's Sun Weblog Compatibly Evolving BigDecimal Back in JDK 5, JSR 13 added true floating-point arithmetic to BigDecimal, which involved many new methods and constructors along with new supporting classes in the java.math package. I was actively involved in the JSR 13 expert group and integrated the code into the JDK. These changes had some surprising compatibility impacts which can be classified according to their source, binary, and behavioral effects. The numerical values representable in BigDecimal are (unscaledValue × 10-scale) where unscaledValue is a BigInteger and scale is a 32-bit integer. Before Java SE 5, scale was constrained to be positive or zero (in other words, 10 raised to a negative or zero exponent) and JSR 13 removed this restriction to allow any integer exponent. Consequently, prior to JSR 13 BigDecimal integral values with trailing zeros had to have them explicitly represented; for example the value one million had to be stored as (1,000,000 × 100) rather than (1 × 106) or (10 × 105), etc. One behavioral consequence of JSR 13 was that all the methods operating on BigDecimal values understand and accept numbers without the old exponent restriction. The new API elements added by JSR 13 are listed in table 1; the additions will be examined under each kind of compatibility.
Binary CompatibilityAdding new public methods and constructors, even ones that overload existing names is binary compatible. Adding public static final fields is binary compatible, meaning existing clients of the library will continue to link. However, there is a possible complication here since BigDecimal is not final and since it has public constructors, it can be subclassed. (As discussed in Effective Java, Item 13, Favor Immutability, this was a design oversight when the class was written.) Adding fields to classes can be binary incompatible, but the needed combination of circumstances does not arise in this case. Therefore, individually and as a whole, the BigDecimal API additions are binary compatible. Source CompatibilityFor source compatibility, we can distinguish between clients of a types and extenders/implementors of a type; certain changes can inconvenience extenders/implementors but not clients. Adding the public static final fields is binary-preserving source compatible. If a subclass, say MyDecimal, already has a field with the same name as a field being added to BigDecimal, the existing declaration in MyDecimal hides the new declaration in the parent class BigDecimal. Therefore, existing uses of, say, MyDecimal.TEN, would continue to resolve to the same binary name. Since constructors are not inherited and all the new constructors are public rather than protected, just the uses of constructors in clients needs to be considered; there are no distinct special issues for subclasses. The constructors in BigDecimal during Java SE 1.4.x, the platform version immediately predating JSR 13, are listed in table 2.
To assess the source compatibility impact, we can compare the new constructors with the old constructors and see if any possible overload resolutions would change, including the possibility of stopping an existing compilation by removing the existence of a most specific method. Of the twelve new constructors, ten are clearly not problematic and binary-preserving source compatible; the ten either have more parameters than the existing constructors or are not applicable to the same invocations, see table 3. For example, eight of the new constructors have the new type MathContext as a parameter. Because of primitive subtyping the other two new constructors, BigDecimal(int val) and BigDecimal(long val) are both applicable to and more specific than invocations that would previously resolve to BigDecimal(double val). Therefore, adding these two new constructors is not binary-preserving source compatible because a different constructor can be resolved for the same existing source code, code with one-argument calls to a BigDecimal constructor where the argument is a primitive type. These two constructors need a secondary screening to assess their behavioral equivalence.
Before JDK 5, the expressions BigDecimal(123) and BigDecimal(123L) in source code would resolve to a call to BigDecimal(double); as part of that resolution primitive widening conversion converts the argument expression to double before the constructor is invoked. All int values are exactly representable as double and the double constructor when given an integral value will return a BigDecimal with the numerical value in question and a scale of zero. The new int constructor will also return a BigDecimal with the numerical value of the argument and a scale of zero. Therefore, adding the int constructor will result in behavioral equivalent programs; although the new constructor will cause some invocations to resolve to a different constructor, calling the other constructor will still always result in an equivalent,
bd1.equals(bd2)==true, BigDecimal. However, the new long constructor does not have behavioral equivalence for all values. Some long values are not exactly representable in double and the old long → double conversion can silently lose precision. For example, printing the value of (new BigDecimal(Long.MAX_VALUE)) gives Partially because of the unintentional, if beneficial, change in source meaning as well as some of the usual reasons (possibility to cache, etc.), in retrospect I think it would have been preferable for the functionality of all twelve new constructors to be provided through static factories instead. (While not directly applicable in BigDecimal, in general even if constructors aren't considered harmful, static factories can have better generics support. A similar analysis can be undertaken for all the new methods. Additionally, since subclasses are possible, inheritance conflicts need to be considered too. Note that the new methods taking MathContext and RoundingMode parameters cannot conflict with existing methods in subclasses so all those additions are binary-preserving source compatible. However, if all the parameters of a new method are existing types, a subclass could potentially have a conflicting method with an unrelated return type. For example, MyDecimal could have a (strange) public double divide(BigDecimal divisor) method which would conflict with the addition of public BigDecimal divide(BigDecimal divisor). While BigDecimal generally shouldn't be subclassed, the addition of some of these new methods could prevent existing subclasses from compiling, yet another reason to favor composition over inheritance. Behavioral CompatibilityIn terms of evolving the behavior of existing methods after introducing the expanded exponent range, the main issues were the behavior of arithmetic operations and text ↔ BigDecimal conversion operations; the latter would prove to be unexpectedly troublesome. As summarized in table 4, the behavior of arithmetic operations was quite compatible with a number of strong invariants. Given input values a1 and b1 representable under the old system, and given an existing method, say add, and its result c1, in the old and new BigDecimal if the inputs to an operation are .equals, same numerical value and same representation, then the output is exactly equivalent too, same numerical value with the same representation. More generally, in the old and new BigDecimal if the inputs to an operation satisfy the weaker property of being compareTo() == 0, meaning they have same numerical value but with a possibly different representation, then the output will be numerically equal, but possibly with a different representation.
A main advantage of decimal arithmetic over binary arithmetic is what-you-see-is-what-you-get for input and output values, the complicated vagaries of binary ↔ decimal conversion can be avoided and exact computation can be straightforward. Therefore, when removing the restriction on exponent values, being able to have a textual representation that readily mapped to all possible unscaled value and exponent pairs was paramount to make the new arithmetic usable. Before JSR 13, the toString method did not use exponential notation, all leading and trailing zeros were explicit. For fractional values, the length of the output grew linearly with the size of the exponent, as well as the number of digits of precision. Conversely, without negative exponents, the internal representation and string output of integer-valued BigDecimal numbers grew with the magnitude of the number, even when it was inherently low-precision. To take advantage of the new unrestricted exponent range, a textual notation was needed that allowed the positive or negative exponent to be recovered; this was accomplished by changing to using scientific notation in the toString output. When converting from text to BigDecimal, a positive exponent could be reconstructed from integer values that previously would have been forced to have a zero exponent. However, the new output was legal input to the old constructors, so similar properties similar to the old and new arithmetic behavior applied:
If needed, in the new BigDecimal on textual input the old semantics on exponents is easy to code: BigDecimal bd = new BigDecimal(myString); if (bd.scale() < 0) bd = bd.setScale(0); and a toPlainString method was added to provide the old-style output when needed. Staying within the realm of old and new BigDecimal versions, these arrangements solidly preserve a very reasonable kind of behavioral compatibility, numerical value and representation are kept constant when possible, otherwise, numerical value is preserved possibly with a different representation. Backwards serial compatibility is slightly weaker; rather than being converted to exponent-zero values as done for textual inputs, new serial streams holding positive exponents are rejected by old BigDecimal implementations. Unfortunately, despite these consistencies across JDK versions, some users of BigDecimal still ran into compatibility issues from the textual output changes made by JSR 13. A common use for BigDecimal is interfacing to databases and while the new scientific notation was legal input to the old BigDecimal string constructors, scientific notation was not legal notation to databases. The addition of the toPlainString method did not help the situation without recompiling the source of the application in question; such recompilation could be unwanted since it would tie the application to JDK 5 with the new method. Other unpalatable workarounds include subclassing BigDecimal to enforce the old toString behavior or using reflection to see if the toPlainString method is available to call to avoid introducing a hard dependency on the new method. While the changes in textual input and output of BigDecimal were reasonable in the context of direct Java compatibility, the expert group underestimated the behavioral compatibility impact of these change when dealing with databases. While the changes remain justifiable in terms of supporting the new values, if the compatibility cost were known, the expert group could have and should have worked with database vendors to mitigate the migration cost associated with this change. ConclusionFully understanding the compatibility impact of changes is subtle and shortcomings are quick to lead to user anger. Merely maintaining binary compatibility is not sufficient for many purposes. Following good coding guidelines from the beginning can pay silent rewards when later evolving the class by reducing the space of possible concerns. AcknowledgmentsAlex provided helpful comments on a draft of this entry. Further Reading
Trackback URL: http://blogs.sun.com/darcy/entry/compatibly_evolving_bigdecimal
Post a Comment: |
Calendar
RSS Feeds
All /Annotation Processing /General /Java /JavaOne /Numerics /OpenJDK SearchLinks
NavigationReferersToday's Page Hits: 646 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Now all we need is operator overloading.
Posted by 75.161.0.244 on April 26, 2008 at 07:26 AM PDT #
Why should i be careful using java%
Posted by mika on April 27, 2008 at 02:19 PM PDT #
Operator overloading would be a fine topic for a future blog entry...
Posted by Joe Darcy on April 28, 2008 at 09:51 AM PDT #