Saturday Nov 04, 2006

OOo my Threading (2)

I've already talked a bit about status quo of threading in OOo, and listed some largely useful stuff that others have been doing to remedy the problems shared-state multi-threading poses.

Why not going further? If there's a way to know that a function is pure, or that an object is thread-safe, then it would be nice to have automatic parallelization employed. The underlying issue that needs to be controlled here are races - prohibiting unserialized access to shared state. So, either a function is pure, i.e. does not depend on shared state at all, or the called subsystem takes care of itself against concurrent modification (this is most easily achieved by the environment concept of the UNO threading framework: the UNO runtime implicitely serializes external access to thread-unsafe appartements).

Although C++ provides no portable ways to express those concepts on a language level, for pure functions, there's a way of using a limited subset of lambda expressions, that inhibit access to shared state on the syntax level. And it's perfectly possible to mark objects (or even subsets of a class' methods) to be thread-safe. One straight-forward way to do this are specializations of UNO interface references, i.e. ones that denote thread-safe components.

Given all of this, we can form statements that contain:

    unsafe method calls

    thread-safe method calls

    pure function calls

    impure function calls

So, in fact, a runtime engine could reason about which subexpressions can be executed concurrently, and which must be serialized. If you treat method calls as what they are, e.g. implicitely carrying a this pointer argument, a possible data flow graph might look like this:

new object1                      new object2
 |                                |
 +->object1::methodA              +->object2::methodA
            |                               |
            +------------------------------>+->object2::methodB(object1)
                                                         |
                                                         v
                                                 object1::methodC

new object3
 |
 +->object3::methodA

That is, the this pointer is carried along as a target for modifications, and as soon as two methods have access to the same object, they need to be called sequentially. This does not apply for UNO interface calls or objects that are tagged as thread-safe, of course. To be specific, a forest of data flow trees can be generated, which defines a partial ordering over the subexpressions. If neither exp1<exp2 nor exp1>exp2 can be deduced from this ordering, those two subexpressions can be executed in parallel. Really basic stuff, that compiler optimizers do, as well - only that plain C/C++ doesn't provide that many clues to safely parallelize. From the example above, it is obvious that object3::methodA can be executed concurrently to all other methods, that object1::methodC must be execute strictly after object2::methodA, and that object1::methodA and object2::methodA can also be executed concurrently.

Okay, this is largely crack-smoking. But there is something to be made of it. Stay tuned.

Comments:

Post a Comment:
Comments are closed for this entry.