David Dice's Weblog
Java Memory Model concerns on Intel and AMD systems
The Java Memory Model (JMM) was recently clarified by JSR-133 with the corresponding changes incorporated into chapter 17 of the Java Language Specification, 3rd edition. Doug Lea's excellent JSR-133 Cookbook reinterprets JSR-133 from the perspective of and for the benefit of JVM implementers. A JVM must reconcile the JMM and the memory consistency model of the underlying platform. Intel/AMD (x86) and SPARC Total Store Order(TSO) define relatively strong memory consistency models; the only architectural reordering of concern is that a store followed by a load in program order can be reordered by the platform such that the store becomes visible before the load executes. If we require that store to become visible before the load executes then a serializing instruction -- typically an atomic instruction, such as CAS or a fence (MFENCE, MEMBAR #StoreLoad) -- must execute between the store and load in question.
The JMM defines a strong memory model akin to sequential consistency (SC) for volatile accesses. On Intel, AMD and SPARC processors, it's sufficient for the JVM to execute a fence instruction after all volatile stores. In practice this means that while translating Java bytecode to native code the just-in-time-compiler, or JIT, emits a fence after all volatile stores. In addition to avoiding architectural reordering through the use of fence instructions, the JIT will also avoid compile-time ordering of volatile accesses. To be somewhat more precise, a volatile load has acquire semantics and a volatile store has release semantics.
Of late, however, both Intel in their IntelĀ® 64 Architecture Memory Ordering White Paper and AMD (in section 7.2 "Multiprocessor Memory Access Ordering" of their recently updated systems programming guide) have relaxed the definition of their platform memory models. Under their previously defined memory models, for instance, if MFENCE instructions appeared between all store-load pairs you'd effectively have sequential consistency. That no longer holds. Instead of sequential consistency we'll instead have slightly weaker causal consistency. (As an aside, I wonder if these specification changes apply to existing processors already in the field -- that is, they clarify the behavior of existing processors -- or if they reflect future or planned processors? I'd hope the latter). Intel claims to have analyzed a large body of existing code in the field and believes that no programs will observe the change or be adversely affected. Strictly speaking, however, existing JVMs that emit MFENCE instructions after volatile stores would be in violation of the JMM when running on processors that actually implemented causal consistency instead of the previous TSO-like model. Collectively, we could clarify the JMM yet again to admit causal consistency for volatiles. Another option would be to change the code emission in the JIT to use locked instructions or XCHG instead of MFENCE. By my reading of the new Intel and AMD documents that'd be sufficient to put the JVM into compliance with the JMM on processors with the relaxed memory model. That's likely slower, however.
Readers interested in this topic would also likely enjoy Hans Boehm's presentation Getting C++ Threads Right which touches on the analogous problem for the new C++0x memory model (Youtube video).
Posted at 11:45AM Jan 16, 2008 by David Dice in General | Comments[4]
Always enjoy reading your posts! I didn't realize Intel and AMD were making this kind of change.
Posted by huntch (aka charlie hunt) on January 19, 2008 at 11:59 AM EST #
Doug Lea's JSR-133 Cookbook, while an excellent intro doc, is sometimes a little bit confusing.
For example, it states that "on the processors discussed below, a StoreLoad is strictly necessary only for separating stores from subsequent loads of the same location(s) as were stored before the barrier".
In a first instance, I interpreted this to mean that StoreLoads can be eliminated when they separate a store from a load on a *different* location. But consider the following program.
--------------------------------
Volatiles u, v.
Initially u = v = 0.
--------------------------------
T1 | T2
---------------|-----------------
11: u = 1; | 21: v = 1;
12: u = 2; | 22: v = 2;
13: r1 = v; | 23: r2 = u;
---------------|-----------------
The JMM doesn't allow the final result r1 = r2 = 1 with volatiles u, v. I cannot see how to impose a total sync order compatible with prog order that allows *both* reads to see 1. Sequential consistency is safe.
On the other side, on machines where other than StoreLoad barriers are no-ops, the above program seems not to need any barrier at all, since according to my understanding of the above, the potential StoreLoads between 12 and 13 and between 22 and 23 could be removed. But this would then be exactly the same as a program with nonvolatiles u, v, for which the JMM allows r1 = r2 = 1. This contradicts the above reasoning. The barriers are needed despite the above quotation may suggest the contrary.
Fortunately, the cookbook later prescribes to issue a StoreLoad barrier to separate a volatile store from a volatile load without further case analysis.
Moreover, I interpret the JSR-133 Cookbook as saying that the purpose of barriers is to avoid reorderings at the processor level (including write buffers, caches and execution units), not necessarily at the shared memory level. To me, this means that the memory actions are *issued* to memory according to the barriers, not that they are actually *seen* by other CPUs in the same order as issued.
This, too, is a little bit confusing to me.
On the other hand, the hardware docs of both Intel and AMD are even more confusing. For example, the Intel doc you refer to doesn't mention xFENCE instructions at all. What about their guarantees? The doc chooses to remain silent.
Posted by Raffaello Giulietti on January 28, 2008 at 11:02 AM EST #
Hi Dave
In answer to your parenthetical "aside" in para 3 of this entry, we've specifically asked Intel the question of whether this represents a change in behaviour or a simple clarification of existing behaviour.
The response from the authors of the Intel document is that this is a clarification. There is no change in the behaviour of new CPUs relative to the behaviour of older CPUs.
Of course it's absolutely true to say that this means that mfence is not an adequate enforcement of sc semantics, but then again it turns out that it never was.
Cheers
Paul.
Posted by Paul Murray on February 20, 2008 at 05:56 AM EST #
Hi David,
I think JMM also needs an update regarding its guarentees for final fields. Also, some examples in JMM and some of J2SE sources use non-final fields as if they were final. I've put notes on it at http://negev.wordpress.com/java-memory-brief/
Posted by Peter Kehl on April 26, 2008 at 04:39 AM EDT #