David Dice's Weblog

Friday May 29, 2009

Instruction selection for volatile fences : MFENCE vs LOCK:ADD

In the past the JVM has used MFENCE but, because of latency issues on AMD processors and potential pipeline issues on modern Intel processors it appears that a LOCK:ADD of 0 to the top of stack is preferable (Gory Details).

Sunday May 24, 2009

memcpy() concurrency curiosities

I was recently asked to diagnose a problem a customer was encountering that involved Java and the JNI getIntArrayElements() and releaseIntArrayElements() primitives. The outcome of the exploration was sufficiently interesting that I thought I'd mention it here in order that other JNI users might avoid the same pitfall. Briefly, the customer was running a highly threaded application on a machine with lots of available concurrency. The Java threads would repeatedly read from an array of int[]s that was effectively immutable. References to the array were also passed to native C code via JNI calls. Those native methods would access the array via the getIntArrayElements() and releaseIntArrayElements() primitives, but in no case did the native code ever modify the array contents. The problem was that the Java readers would occasionally observe scenarios that suggested that the contents of the array were transiently "flickering" to unexpected values -- ephemeral corruption -- and then returning to the expected values. (Recall that the array was effectively immutable, so this shouldn't have been happening). I carefully checked the Java code and native code and quickly convinced myself the source of the bug was elsewhere.

I next switched my attention to the implementation of getIntArrayElements() and releaseIntArrayElements(). Briefly GetIntArrayElements() will, as used by this particular application, (a) switch the caller's thread state back to "InVM". If there's a stop-the-world garbage collection event going on the state transition operator should block the caller until the GC completes. At that point the thread can safely access the heap to extract the array data. The "InVM" thread state is such that GC should be inhibited while we're copying the data out. (b) malloc a properly sized array to hold the data; (c) memcpy() the data from the heap data into that just allocated array; (d) restore the thread state; and (e) return a pointer to the just allocated array to the caller. ReleaseIntArrayElements() operates in a similar fashion but reverses the memcpy() direction (copying into the heap) and then releases the allocated array. Note that in our case the source and destination memcpy() regions are completely disjoint. My first thought was that something might have been wrong with the state transition operations, allowing what should have been stop-the-world GC to inadvertently run concurrently with the memcpy() operations. Further investigation ruled that out as there were no GC events when the apparent corruption was observed.

Upon reflection, however, I recalled that some optimized memcpy() implementations may execute explicitly coded benign speculative stores into the destination region. Another possibility is the block-initializing-store (BIS) used by memcpy(). A concurrent observer might fetch from a memcpy() target cache line in the narrow timing window between when the line was zeroed and when the contents were copied back into the line. In either case, the memcpy() in releaseIntArrayCritical could transiently mutate the contents of the java array in the heap to some unexpected value. By the time the memcpy() finishes, of course, the destination buffer will again contain the expected contents. I quickly confirmed this scenario might occur by writing a quick throw-away multi-threaded C test case that modeled the JNI behavior and demonstrated the effect. The underlying mechanism at work deserves a bit more explanation. Lets say we're writing C code and have an effectively immutable array A. A contains a known set of values. Concurrently, we have a 1st thread reading from A and 2nd thread memcpy()ing precisely the same set of values known to be in A back into A. At first glance it's not unreasonable to assume that the reader (the 1st thread) will always see the expected values in A. That is, we're assuming that memcpy()ing the same value back into an array is idempotent. The memcpy() operation is certainly idempotent from the point of view of the caller of memcpy(), but interestingly it's not idempotent from the point of view concurrent observers. With optimized memcpy() implementations concurrent observers can see "odd" intermediate states. Naively, you'd expect that concurrent observers would see the array as unchanging, but that isn't the case on all platforms. The case at hand the memcpy() in releaseIntArrayElements() is the source of the problem, as array in heap might "flicker". To further confirm this diagnosis I provided the customer with a simplistic byte-by-byte unoptimized memcpy() implementation in an LD_PRELOAD-able mempcy.so shared object. And indeed, the problem wasn't evident when running with this simple memcpy() instead of the optimized system form.

While technicall legaly, such behavior in memcpy() -- and the JNI operators that call memcpy() -- is, at best, surprising and violates the principle of least astonishment. Strictly speaking, memcpy() provides no particular guarantees about concurrently observable intermediate states, but rather only specifies the state of the destination buffer at the end of the invocation. That is, while a memcpy() is in flight the destination buffer might contain transient or ephemeral values which could be observed by other threads. Overwriting an array with an exact copy of the array will be idempotent from the perspective of the caller after memcpy() completes, but concurrent observers of the array are permitted to see transient ephemeral "garbage" values. My first thought when we ran into this behavior some years ago was that it was a bug in the memcpy() implementation, but upon deeper reflection I concluded the bug was in my errant assumptions about how memcpy() worked. Having said that, while such behavior is technically permissible I'm also sympathetic that such effects are, at best, unexpected. I's not unreasonable to assume the destination buffer would remain unmolested. One could imagine customers easily falling into this trap, in both C code and Java.

Copyright © 2006-2008 Dave Dice, All Rights Reserved.

Calendar

Feeds

Search

Links

Navigation

Referers

XML Feed
Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 3.0 License.