OOo my Threading...
This starts where Stephan's OOoCon talk left off - by taking his problem statement and maybe a gist of the proposed concepts into OOo's core (which happens to be C++ instead of Haskell. Which some folks might bemoan, but still, ~90% of our code consists of that language).
To recap, the free lunch is over for us application software developers, i.e. either OOo gets seriously multi-threaded soon, or we stop adding features and only optimize for speed, or OOo will get significantly slower within the next few years. But as you, the gurus and yours faithfully have noticed by now, becoming concurrent with osl::MutexGuard, osl::Thread & friends is tedious, error-prone, and neither race nor dead-lock free for a project of OOo's size. With the expressive power of a general programming language, it is even theoretically impossible to verify against these properties (for a sufficiently large project. OOo probably satisfies this constraint).
So, Stephan already showed what's possible in other languages. To boot, the trick is to raise the level of abstraction above digging-in-the-dirt, hand-made, explicit, and ugly locking or thread fiddling. Whereever possible, parallelism should be detected either implicitely, or at least denoted declaratively (instead of manually telling the machine what to do). John Ousterhout has a nice writeup about that. It's basically another instance of the lack of transparency and ease of use that is so ingrained with day-to-day tasks in C++. For stuff like object lifetime management, serialization and so on, boost provides transparent solutions. For large-scale concurrency management, the UNO threading framework does. For local, statement-level parallelization, there ain't much - or is there?
In fact, there actually are several attempts to extend C++ in that direction. To name a few:
From a coarse inspection, Boost.Act seems appealing (TBB really isn't an option for OOo, no?) - boost is high-quality anyway, and I think that the Action concept is a richer one than Futures.
Basically, the advanced approaches listed above employ declarative semantics, i.e. one no longer needs to create threads by hand, assign a thread function, and manually serialize access to shared state. Instead, one simply states that a loop body shall be executed in parallel:
parallel_transform( src.begin(), src.end(), dest.begin(), bind(_1 * 42) );
This multiplies each element in src with 42, and stores
the result in dest. How many threads are created, and
where they are taken from is completely up to the implementation - a
typical threading runtime on a machine with N cores will
hold a static pool of N threads, each of them receiving a
chunk of the parallel_transform workload from above. In
the old way of doing multi-threading, this behaviour would potentially
be spread over the whole code base (making it a cross-cutting
concern, just like explicit method guarding is today - with thousands
of osl::MutexGuard(m_aMutex) invocations all over the
place).
One of the crucial points of doing stuff like the above in a safe way
is to avoid hidden access to shared state. Access to the container can
be serialized by parallel_transform, but if the called
functor modifies shared objects concurrently, all bets are off. Thus,
the listed frameworks usually have some safe-guards in place, most of
them using a way of explicitely annotating free-standing functions and
classes to be thread-safe (note that this tagging of thread-safeness
can easily be achieved by means of the UNO
threading framework as well - Kay already has different
uno::Reference types on his list, to distinguish between
thread-safe and thread-unsafe components during compile-time). Another
important concept noteworthy here are pure functions,
i.e. functions that, given the same input parameters, always return
the same result, no matter where or when called (related, but not
exactly the same are referentially
transparent functions, which could also be exploited by an
optimizing parallel execution runtime). It's always safe to call pure
functions on disjunct parameters concurrently with no extra
synchronization effort. A safe (because syntactically limited) way of
defining pure functions is via boost's bind or
lambda library. There, expressions that modify shareable state could
simply be banned from the available syntax (they aren't - but it's
possible to do that).
Need an example? Good, take the standard one, that so naturally asks
for parallelization, that it's usually been sped up via SIMD in the past (you
might have noticed that those parallel_something
algorithms principally apply the same instructions (the functor to
run) in parallel to all container elements - that's SIMD), namely operating on image
pixel:
// color-convert an image from RGB to CMYK color space
parallel_transform(src_img.begin(),
src_img.end(),
dst_img.begin(),
bind(&rgb2cmyk,_1,cref(conversion_parms));
This is the archetype of inherent parallelism, having a huge amount of data (image pixel), that all need to have the same operation applied to. Disregarding memory bandwidth for the moment, this snippet scales linearly with the number of available cores.
Note the conversion_parms parameter to the bound
function, which can pose problems if rgb2cmyk silently
const_casts it to something mutable. This is the dark
corner of C/C++'s non-enforcable constness, and therefore, a minimal
amount of developer discipline is necessary: if objects or functions
employ magic behind their public interfaces (on-demand creation of
data, buffering, caching, copy-on-write), this still needs to be made
thread-safe explicitely. Fortunately, for each of the aforementioned
items, there's a thread-safe helper available in OOo (use rtl::Static
for late-initialization of static data, boost::shared_ptr
for lazy initialization of class members, caching and buffering, and
cow_wrapper
for copy-on-write).
To be continued.
Posted at 10:02PM Oct 25, 2006 by thorsten in OpenOffice.org |