Joseph D. Darcy's Sun Weblog

Joseph D. Darcy's Sun Weblog


20081204 Thursday December 04, 2008

Hexadecimal Floating-Point Literals

One of the more obscure language changes included back in JDK 5 was the addition of hexadecimal floating-point literals to the platform. As the name implies, hexadecimal floating-point literals allow literals of the float and double types to be written primarily in base 16 rather than base 10. The underlying primitive types use binary floating-point so a base 16 literal avoids various decimal ↔ binary rounding issues when there is a need to specify a floating-point value with a particular representation.

The conversion rule for decimal strings into binary floating-point values is that the binary floating-point value nearest the exact decimal value must be returned. When converting from binary to decimal, the rule is more subtle: the shortest string that allows recovery of the same binary value in the same format is to be used. While these rules are sensible, surprises are possible from the differing bases used for storage and display. For example, the numerical value 1/10 is not exactly representable in binary; it is a binary repeating fraction just as 1/3 is a repeating fraction in decimal. Consequently, the numerical values of 0.1f and 0.1d are not the same; the exact numeral value of the comparatively low precision float literal 0.1f is
0.100000001490116119384765625
and the shortest string that will convert to this value as a double is
0.10000000149011612.
This in turn differs from the exact numerical value of the higher precision double literal 0.1d,
0.1000000000000000055511151231257827021181583404541015625. Therefore, based on decimal input, it is not always clear what particular binary numerical value will result.

Since floating-point arithmetic is almost always approximate, dealing with some rounding error on input and output is usually benign. However, in some cases it is important to exactly specify a particular floating-point value. For example, the Java libraries include constants for the largest finite double value, numerically equal to (2-2-52)·21023, and the smallest nonzero value, numerically equal to 2-1074. In such cases there is only one right answer and these particular limits are derived from the binary representation details of the corresponding IEEE 754 double format. Just based on those binary limits, it is not immediately obvious how to construct a minimal length decimal string literal that will convert to the desired values.

Another way to create floating-point values is to use a bitwise conversion method, such as doubleToLongBits and longBitsToDouble. However, even for numerical experts this interface is inhumane since all the gory bit-level encoding details of IEEE 754 are exposed and values created in this fashion are not regarded as constants. Therefore, for some use cases it helpful to have a textual representation of floating-point values that is simultaneously human readable, clearly unambiguous, and tied to the binary representation in the floating-point format. Hexadecimal floating-point literals are intended to have these three properties, even if the readability is only in comparison to the alternatives!

Hexadecimal floating-point literals originated in C99 and were later included in the recent revision of the IEEE 754 floating-point standard. The grammar for these literals in Java is given in JLSv3 §3.10.2:

HexFloatingPointLiteral:

HexSignificand BinaryExponent FloatTypeSuffixopt

This readily maps to the sign, significand, and exponent fields defining a finite floating-point value; sign0xsignificandpexponent. This syntax allows the literal

0x1.8p1

to be to used represent the value 3; 1.8hex × 21 = 1.5decimal × 2 = 3. More usefully, the maximum value of
(2-2-52)·21023 can be written as
0x1.fffffffffffffp1023
and the minimum value of
2-1074 can be written as
0x1.0P-1074 or 0x0.0000000000001P-1022, which are clearly mappable to the various fields of the floating-point representation while being much more scrutable than a raw bit encoding.

Retroactively reviewing the possible steps needed to add hexadecimal floating-point literals to the language:

  1. Update the Java Language Specification: As a purely syntactic changes, only a single section of the JLS had to updated to accommodate hexadecimal floating-point literals.

  2. Implement the language change in a compiler: Just the lexer in javac had to be modified to recognize the new syntax; javac used new platform library methods to do the actual numeric conversion.

  3. Add any essential library support: While not strictly necessary, the usefulness of the literal syntax is increased by also recognizing the syntax in Double.parseDouble and similar methods and outputting the syntax with Double.toHexString; analogous support was added in corresponding Float methods. In addition the new-in-JDK 5 Formatter "printf" facility included the %a format for hexadecimal floating-point.

  4. Write tests: Regression tests (under test/java/lang/Double in the JDK workspace/repository) were included as part of the library support (4826774).

  5. Update the Java Virtual Machine Specification: No JVMS changes were needed for this feature.

  6. Update the JVM and other tools that consume classfiles: As a Java source language change, classfile-consuming tools were not affected.

  7. Update the Java Native Interface (JNI): Likewise, new literal syntax was orthogonal to calling native methods.

  8. Update the reflective APIs: Some of the reflective APIs in the platform came after hexadecimal floating-point literals were added; however, only an API modeling the syntax of the language, such as the tree API might need to be updated for this kind of change.

  9. Update serialization support: New literal syntax has no impact on serialization.

  10. Update the javadoc output: One possible change to javadoc output would have been supplementing the existing entries for floating-point fields in the constant fields values page with hexadecimal output; however, that change was not done.

In terms of language changes, adding hexadecimal floating-point literals is about as simple as a language change can be, only straightforward and localized changes were need to the JLS and compiler and the library support was clearly separated. Hexadecimal floating-point literals aren't applicable to that many programs, but when they can be used, they have extremely high utility in allowing the source code to clearly reflect the precise numerical intentions of the author.

(2008-12-04 00:00:02.0) Permalink Comments [2]

Build architectures and managing dependencies

Since the OpenJDK source was released, there have been various discussions about aspects of the build architecture, including how dependencies on third parties libraries should be managed. For example, on a Linux system should the JDK use its own libzip or the libzip that comes with the distribution? I think the appropriate answers to these and related question hinge on whether the final deliverable from the OpenJDK project is viewed as the source code itself or a binary built from the source.

Traditionally before OpenJDK, the end result of the Sun's JDK project that most people used were the JDK and JRE binaries. These binaries were meant to be universal in the sense of being usable on a given processor family across a version range of operating systems. For example, there was a single windows x86 binary for use on, say, Windows NT, Windows XP, etc., a single Solaris SPARC binary for use across Solaris 8 through Solaris 10, and effectively a single Linux x86 binary for use across different Linux distributions.

This "single binary" model drives decisions about what platform to produce the binary on, generally an older release, and what assumptions can be made about available system resources, generally rather weak ones. With a single binary deliverable, since fewer environment resources can be relied on, making the JDK build self-contained is necessary for it to be reliable in a wide variety of environments. With this delivery architecture, there is some justification to, for example, including a copy a library like libzip in the JDK build rather than relying on a system library, even though there are increased maintenance costs.

However, when an OpenJDK/IcedTea binary is being built on a particular Linux distribution for use only on that distribution, the constraints are different. If the build is being done by the OS vendor, the vendor controls the OS contents and knows whether or not system libraries like libzip are reliable and kept up to date. Since stronger assumptions can be made about the host environment, weaker conditions need to be fulfilled by the JDK source tree. For an OS vendor, relying on a single copy of native libraries for the OS and the JDK is preferable to building (and maintaining) multiple copies.

Going forward, I'd expect the JDK build to evolve to better accommodate options to use host platform resources. Perhaps modules systems in the future can help manage such dependencies more transparently.

(2008-12-04 00:00:01.0) Permalink Comments [1]

Calendar

« December 2008 »
SunMonTueWedThuFriSat
 
1
5
6
7
9
10
12
13
14
15
16
17
18
19
20
21
22
24
25
26
27
28
29
30
31
   
       
Today

RSS Feeds

XML
All
/Annotation Processing
/General
/Java
/JavaOne
/Numerics
/OpenJDK

Search

Links

    Blogroll
  • Download the JRE

    News

Navigation



Referers

Today's Page Hits: 417