Monday Nov 13, 2006

OOo my Threading (3)

Okay, what does all of this really buy us, right now? Lets assume we want to speed up Calc's way of parsing and calculating cell values. Given the following dependency tree (cell 1 refers to the result of cell4, cell 6 to the result of cell2, and so forth):

    parse_cell_4                                parse_cell_5
         |                                           |
    parse_cell_1        parse_cell_2            parse_cell_3
         |                   |                       |
         -------------- parse_cell_6 -----------------

The partial ordering of cell evaluation equivalent to this tree is as follows:

parse_cell_4<parse_cell_1
parse_cell_5<parse_cell_3
parse_cell_1<parse_cell_6
parse_cell_2<parse_cell_6
parse_cell_3<parse_cell_6

Wanting to employ what we've talked about earlier, to parallelize this calculation, one would need a static expression (i.e. one fixed at compile time):

as_func(
    as_func(
        parse,
        as_func(
            parse,
            4),
        1),
    as_func(
        parse,
        2), 
    as_func(
        parse,
        as_func(
            parse,      
            5),
        3))

(with as_func denoting that the argument should be evaluated lazily, and parse being a Calc cell parsing unary function, expecting the cell number as its sole parameter).

Now, having the formula dependencies fixed in the code isn't of much value for a spreadsheet, thus we need to handle this a bit more dynamically. Futures and Actions in principle provide the functionality that we need here, only that there's one slight catch: the pool of threads processing the Actions or Futures might actually be too small to have all cells resolved, that in turn are referenced from other ones (with circular cell references being the extreme example. But see below). This is because forcing a Future or Action to evaluate will block the thread requesting that value, which will eventually lead to starvation, unless there's at least one more thread active to process cell values other Actions are waiting for.

Of course, one could remedy this by using N threads when dealing with N cells. But that would get out of hand pretty quickly. Alternatively, the cell parsing function can be split into two parts: the first generates a parse tree, thus extracting the referenced cells (depending on the heap allocation overhead, one could also throw away the parser outcome, except for the cell references. But OTOH, with a decent thread-local allocator, the full parsing could happen concurrently. YMMV). Given the list of references for each cell, one again gets the partial ordering over the value calculation:

vector< vector< int > >  intermediate;
parallel_transform( cells.begin(), cell.end(),
                    intermediate.begin(),
                    &parse_step_1 );

For each cell, this step yields a vector of preconditions (other cells, that need to be calculated before). Pushing the actual cell value calculation functions into a job queue, and handing it the dependency graph (represented by the individual cell's references) generated above, we arrive at a fully dynamic version of the concurrent cell parsing example:

int              i=0;
job_queue        queue;
vector< double > results(intermediate.size(),0.0);
transform( intermediate.begin(), 
           intermediate.end(),
           results.begin(),
           bind( &job_queue::add_job,
                 ref(queue),
                 bind( &parse_step_2,
                       ref(results),
                       ref(cells),
                       var(i)++ ),
                _1 ));
queue.run();

This looks weird, but peeking at the prototypes of the involved functions might help clear things up:

/** adds a job functor

    @param functor
    Functor to call when job is to be executed

    @param prerequisites
    Vector of indices into the job queue, that must be processed
    strictly before this job. Permitted to use not-yet-existing 
    indices.
 */
template job_queue::add_job( Func               functor,
                                            vector const& prerequisites );

and

/** calculates cell value

    @param io_results
    In/out result vector. Receives resulting cell value, and is used
    to read referenced cell's input values.

    @param content
    Cell content

    @param cell_index
    Index of cell to process
 */
parse_step_2( vector& io_results,
              string const&   content,
              int             cell_num );

This job queue can then decide, either globally or via policies, what to do in various situations:

    Circular dependencies: either refuse working on such a job, or use a default value for an arbitrary member of the cycle (to break it)

    Whether to execute jobs in parallel or not. Depending on the number of cores available (both physically and load-wise), the queue could decide to stay single-threaded, if the number of jobs is low, or multi-threaded for a larger number of jobs. Note that this decision might be highly influenced by the amount of work a single job involves, and therefore external hints to the queue might be necessary. Kudos to mhu for the hint, that it's wise to parallelize ten jobs that take one hour each, but not so for twenty jobs that only take a few microseconds to complete.

At any rate, fine-tuning this to various hardware, operating systems and deployment settings is much easier than for manual thread creations. Plus, given a few (falsifiable) attributes of the functions called, it's also much safer.

Comments:

"This is because forcing a Future or Action to evaluate will block the thread requesting that value" Evaluation of the future can proceed on the blocked thread, so that you will never run out of threads (you will potentially need larger stacks, though). Where circular dependencies lead to deadlock in the original setup, they lead to stack overflow in this one.

Posted by Stephan Bergmann on November 13, 2006 at 01:46 PM CET #

Erm, forget my previous comment, please. I was thinking about <VAR>X</VAR> while writing about <VAR>Y</VAR>.

Posted by Stephan Bergmann on November 14, 2006 at 09:35 AM CET #

Post a Comment:
Comments are closed for this entry.