Thursday December 04, 2008
Joseph D. Darcy's Sun WeblogJoseph D. Darcy's Sun Weblog Hexadecimal Floating-Point Literals One of the more obscure language changes included back in JDK 5 was the addition of hexadecimal floating-point literals to the platform. As the name implies, hexadecimal floating-point literals allow literals of the float and double types to be written primarily in base 16 rather than base 10. The underlying primitive types use binary floating-point so a base 16 literal avoids various decimal ↔ binary rounding issues when there is a need to specify a floating-point value with a particular representation.
The conversion rule for decimal strings into binary floating-point values is that the binary floating-point value nearest the exact decimal value must be returned. When converting from binary to decimal, the rule is more subtle: the shortest string that allows recovery of the same binary value in the same format is to be used. While these rules are sensible, surprises are possible from the differing bases used for storage and display. For example, the numerical value 1/10 is not exactly representable in binary; it is a binary repeating fraction just as 1/3 is a repeating fraction in decimal. Consequently, the numerical values of 0.1f and 0.1d are not the same; the exact numeral value of the comparatively low precision float literal 0.1f is Since floating-point arithmetic is almost always approximate, dealing with some rounding error on input and output is usually benign. However, in some cases it is important to exactly specify a particular floating-point value. For example, the Java libraries include constants for the largest finite double value, numerically equal to (2-2-52)·21023, and the smallest nonzero value, numerically equal to 2-1074. In such cases there is only one right answer and these particular limits are derived from the binary representation details of the corresponding IEEE 754 double format. Just based on those binary limits, it is not immediately obvious how to construct a minimal length decimal string literal that will convert to the desired values. Another way to create floating-point values is to use a bitwise conversion method, such as doubleToLongBits and longBitsToDouble. However, even for numerical experts this interface is inhumane since all the gory bit-level encoding details of IEEE 754 are exposed and values created in this fashion are not regarded as constants. Therefore, for some use cases it helpful to have a textual representation of floating-point values that is simultaneously human readable, clearly unambiguous, and tied to the binary representation in the floating-point format. Hexadecimal floating-point literals are intended to have these three properties, even if the readability is only in comparison to the alternatives! Hexadecimal floating-point literals originated in C99 and were later included in the recent revision of the IEEE 754 floating-point standard. The grammar for these literals in Java is given in JLSv3 §3.10.2:
This readily maps to the sign, significand, and exponent fields defining a finite floating-point value; sign0xsignificandpexponent. This syntax allows the literal
to be to used represent the value 3; 1.8hex × 21 = 1.5decimal × 2 = 3.
More usefully, the maximum value of Retroactively reviewing the possible steps needed to add hexadecimal floating-point literals to the language:
In terms of language changes, adding hexadecimal floating-point literals is about as simple as a language change can be, only straightforward and localized changes were need to the JLS and compiler and the library support was clearly separated. Hexadecimal floating-point literals aren't applicable to that many programs, but when they can be used, they have extremely high utility in allowing the source code to clearly reflect the precise numerical intentions of the author. (2008-12-04 00:00:02.0) Permalink Comments [2]I was heartened to recently come across the article Java's new math, Part 1: Real numbers which detailed some of the additions I made to Java's math libraries over the years in JDK 5 and 6, including hyperbolic trigonometric functions (sinh, cosh, tanh), cube root, and base-10 log. A few comments on the article itself, I would describe java.lang.StrictMath as java.lang.Math's fussy twin rather than evil twin. The availability of the StrictMath class allows developers who need cross-platform reproducible results from the math library to get them. Just because floating-point arithmetic is an approximation to real arithmetic doesn't mean it shouldn't be predictable! There are non-contrived circumstances where numerical programs are helped by having such strong reproducibility available. For example, to avoid unwanted communication overhead, certain parallel decomposition algorithms rely on different nodes being able to independently compute consistent numerical answers. While the java.lang.Math class is not constrained to use the particular FDLIBM algorithms required by StrictMath, any valid Math class implementation still must meet that stated quality of implementation criteria for the methods. The criteria usually include a low worst-case relative error, as measures in ulps (units in the last place), and semi-monotonicity, whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation Simply adding more FDLIBM methods to the platform was quite easy to do; much of the effort for the math library additions went toward developing new tests, both to verify that the general quality of implementation criteria were being met as well as that verifying the particular algorithms were being used to implement the StrictMath methods. I'll discuss the techniques I used to develop those tests in a future blog entry. (2008-10-29 13:54:42.0) Permalink Comments [6]At times it is useful to summarize a set of values, say a vector of real numbers, as a single number representing the set's size. For example, distilling benchmark subcomponent scores into an overall score. One way to do this is to use a norm. Mathematically, a norm maps from a vector V of a given number of elements to a real number length such that the following properties hold:
There are a few commonly used norms:
The first two norms are instances of p-norms. A p-norm adds up the result of raising the absolute value of each vector component to the pth power (squaring, or cubing, etc.) and then takes the pth root of the sum. The ∞-norm is the limit as p goes to infinity.
Given multiple possible norms, which one should be used? The 2-norm is often easier to work with since it is a differentiable function of the vector components, unlike the 1-norm and ∞-norm. On the other hand, the ∞-norm captures the worst-case behavior. Sometimes one norm is easier to compute than the others.
Another norm might make an error analysis more tractable.
For vectors, in some sense it doesn't matter which norm is used because any two norms, norma and normb, are equivalent in the following sense, there are constants c1 and c2 such that c1 · norma(V) ≤ normb(V) ≤ c2 · norma(V) This means that if one norm is tending toward zero, all other norms are tending toward zero too. For example, commonly in numerical linear algebra there is an iterative process that terminates once the norm of the error is small enough. Concretely, for vectors of size n, the common norms are related as follows: norm2(V) ≤ norm1(V) ≤ sqrt(n) · norm2(V) So to guarantee that the 1-norm is less than epsilon, it is enough to show that 2-norm is less than epsilon/sqrt(n). However, in other ways the different norms are not equivalent; the norms can give different answers on the relative size of different vectors. Consider the three vectors A, B, and C: A = [5, 0, 0] Each vector is considered the largest under one of the norms. I've found the notion of norms to be useful in many different contexts. The performance differences between quicksort and mergesort can be described as quicksort having a better 1-norm but mergesort having a better ∞-norm. Buying more insurance coverage raises the 1-norm of your costs, but lowers your ∞-norm. A more conservative evaluation tends to focus on the worst-case outcome and thus favors something like the ∞-norm. For example, in the math library the relative size of the error at any location must be less than the stated number of ulps (units in the last place). It is not good enough to have a low average error, but a few locations, or even one location, with an very inaccurate result. During software development, risk assessments evolve with the release life cycle. A change that is welcome early in the release may be rejected as too risky a few weeks before shipping; one way to view this phenomena is that a larger value of p is being used to compute risk assessments later in the release. ReferencesApplied Numerical Linear Algebra, James W. Demmel Matrix Computations, Gene H. Golub and Charles F Van Loan Numerical Linear Algebra, Lloyd N. Trefethen and David Bau, III (2007-03-01 15:20:32.0) Permalink Comments [2] What Every Computer Programmer Should Know About Floating-Point Arithmetic, Redux
Next week on Wednesday, October 11, at the Silicon Valley ACCU meeting in San Jose, I'll be giving a version of my talk on What Every Computer Programmer Should Know About Floating-Point Arithmetic, previously seen at Stanford and JavaOne. The meeting is open to the public and free of charge, so if you've ever wondered why adding up ten copies of After the talk, I'll post a copy of the slides. Update: The slides. (2006-10-04 00:00:01.0) PermalinkFor a number of years, the venerable IEEE 754 standard for binary floating-point arithmetic has been undergoing revision and the committee's results will soon be up for ballot. Back in 2003, I was editor of the draft for a few months and helped incorporate the decimal material. The balloting process provides the opportunity for interested parties, such as consumers of the standard, to weigh in with comments; instructions for joining the ballot are available. The deadline for signing up has been extended to October 21, 2006. Major changes from 754 include:
What Every Computer Programmer Should Know About Floating-Point Arithmetic I'm a part-time master's student in Stanford's ICME program and at the departmental seminar I recently gave a talk, What Every Computer Programmer Should Know About Floating-Point Arithmetic. This is a refinement and update of JavaOne talks I've given with a similar title. (2006-06-23 15:07:23.0) Permalink Comments [1] |
Calendar
RSS Feeds
All /Annotation Processing /General /Java /JavaOne /Numerics /OpenJDK SearchLinks
NavigationReferersToday's Page Hits: 506 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||