Weblog

All | CMT | General | NUMA | OpenSolaris | Perl | Photo | Programmers Desk | STREAMS
« Previous day (Jun 15, 2005) | Main | Next day (Jun 17, 2005) »
20050616 Thursday June 16, 2005

Task Queues in OpenSolaris

Overview of Task Queues

It is common for a kernel programmer to postpone processing of some tasks and delegate their execution to another kernel thread. There may be several reasons for doing this:

In all these cases programmer, in essense, needs to execute a piece of code (task) in a different context, where context usually means another kernel thread with different set of locks held and, possibly, a different priority.

Until introduction of task queues in Solaris 8 there was no generic OS facility for such in-kernel context change. Every subsystem used its own ad-hoc mechanisms, usually utilizing ``worker threads'' together with a list of jobs to give them. The task queues interface abstracts common code out of these mechanisms and provides simple way of scheduling asynchronous tasks.

A task queue consists of a list of tasks, together with one or more threads to service the list. If a task queue has a single service thread, all tasks are guaranteed to execute in the order they were dispatched. Otherwise they can be executed in any order. Note that since tasks are placed on a list, execution of one task and should not depend on the execution of another task or a deadlock may occur. A taskq created with a single servicing thread guarantees that all the tasks are serviced in the order in which they are scheduled.

DDI interface for task queues

Kernel users should use the documented DDI interface for all taskq operations. These interfaces are defined in the usr/src/uts/common/sys/sunddi.h header file. The exported interface consists of the following functions:

ddi_taskq_create():
Creates a new taskq object with specified number of threads servicing it. All threads will run with a single specified priority. The priority may have a special value TASKQ_DEFAULTPRI meaning that the priority will be chosen by the system. The arguments are:

dip
The pointer to the dev_info_t structure. Some subsystems do not have a dip pointer and may pass NULL instead.
name
Descriptive string.
nthreads
Number of threads servicing the task.
pri
Priority of threads servicing the task queue. Drivers and modules should specify TASKQ_DEFAULTPRI.
flags
Should be always zero. This argument is reserved for future extensions.

ddi_taskq_dispatch():
Schedules a task for a specified taskq. A task is just a pair {f, a} where f is a function, accepting a single pointer argument and a is its argument value. Additional flags specify whether dispatch may or may not sleep waiting for resources. Once the task is dispatched it will be scheduled asynchronously at some later time and there is no way to cancel a task that is dispatched but have not been executed yet. All the tasks are executed with a fixed priority specified at the time of taskq creation. The arguments are:

tq
The taskq pointer returned by taskq_create().
func
The callback function to call. The function should accept a single argument.
arg
Argument to the callback function.
flags
Flags controlling the dispatch behavior. Possible flags are:

DDI_SLEEP
Allow sleeping (blocking) until memory is available.
DDI_NOSLEEP
Return DDI_FAILURE immediately if memory is not available.

ddi_taskq_wait():
Blocks the taskq from any new dispatches and waits for all previously scheduled tasks to complete, then unblocks the taskq. This function does not stop any new task dispatches. Its single argument is the taskq to wait for.

ddi_taskq_suspend():
suspends all task execution until ddi_taskq_resume() is called. Although ddi_taskq_suspend() attempts to suspend pending tasks, there are no guarantees that they will be suspended. The only guarantee is that all tasks dispatched after ddi_taskq_suspend() will not be executed. Because it will trigger a deadlock, the function should never be called by a task executing on a taskq. Its single argument is the taskq to suspend.

ddi_taskq_suspended():
returns B_TRUE if taskq is suspended, and B_FALSE otherwise. It is intended to ASSERT that the task queue is suspended. Its single argument is the taskq to check.>

ddi_taskq_resume():
resumes taskq execution. Its single argument is the taskq to resume.

ddi_taskq_destroy():
waits for all pending tasks to complete and destroys the task queue and all associated threads.

Observability

Counters

Every taskq created in the system keeps a set of kstat counters associated with it. Try running the following command on your system:


$ kstat -c taskq
module: unix                            instance: 0     
name:   ata_nexus_enum_tq               class:    taskq
        crtime                          53.877907833
        executed                        0
        maxtasks                        0
        nactive                         1
        nalloc                          0
        priority                        60
        snaptime                        258059.249256749
        tasks                           0
        threads                         1
        totaltime                       0

module: unix                            instance: 0     
name:   callout_taskq                   class:    taskq
        crtime                          0
        executed                        13956358
        maxtasks                        4
        nactive                         4
        nalloc                          0
        priority                        99
        snaptime                        258059.24981709
        tasks                           13956358
        threads                         2
        totaltime                       120247890619
 ...

The kstat information above includes:

You can use the power of the kstat command to observe how some counter increases over time:


$ kstat -p unix:0:callout_taskq:tasks 1 5  
unix:0:callout_taskq:tasks      13994642

unix:0:callout_taskq:tasks      13994711

unix:0:callout_taskq:tasks      13994784

unix:0:callout_taskq:tasks      13994855

unix:0:callout_taskq:tasks      13994926

...

DTrace SDT Probes

The taskq implementation also provides several useful SDT probes: All the probes described below have two arguments: the taskq pointer and the pointer to the pointer to the taskq_ent_t structure. It can be used to extract the function and the argument from the D script.

Developers can use these probes to collect precise timing information about individual task queues and individual tasks being executed through them. For example, the following script will print what functions were scheduled via task queues for every 10 seconds:

#!/usr/sbin/dtrace -qs

sdt:genunix::taskq-enqueue
{
  this->tq  = (taskq_t *)arg0;
  this->tqe = (taskq_ent_t *) arg1;
  @[this->tq->tq_name,
    this->tq->tq_instance,
    this->tqe->tqent_func] = count();
}

tick-10s
{
  printa ("%s(%d): %a called %@d times\n", @);
  trunc(@);
}

Running this on my desktop produced the following output1:


callout_taskq(1): genunix`callout_execute called 51 times
callout_taskq(0): genunix`callout_execute called 701 times
kmem_taskq(0): genunix`kmem_update_timeout called 1 times
kmem_taskq(0): genunix`kmem_hash_rescale called 4 times
callout_taskq(1): genunix`callout_execute called 40 times
USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 256 times
callout_taskq(0): genunix`callout_execute called 702 times
kmem_taskq(0): genunix`kmem_update_timeout called 1 times
kmem_taskq(0): genunix`kmem_hash_rescale called 4 times
callout_taskq(1): genunix`callout_execute called 28 times
USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 228 times
callout_taskq(0): genunix`callout_execute called 706 times
callout_taskq(1): genunix`callout_execute called 24 times
USB_hid_81_pipehndl_tq_1(14): usba`hcdi_cb_thread called 141 times
callout_taskq(0): genunix`callout_execute called 708 times

Dynamic Task Queues

Suppose that two friends, Bob and Alice are staying in the cafeteria line with Alice standing behind Bob. The cashier checks Bobs' tray and it turns out that Bob doesn't have enough money, so he wants to borrow from Alice. But Alice is not sure whether she has enough cash until she knows the cost of her lunch. This is a typical deadlock situation - both Bob and Alice can not make any forward progress waiting for each other. The same kind of deadlock may occur if two tasks A and B are placed on a queue which is served by a single thread when there is a resource dependency between A and B. One way to prevent such a deadlock is to guarantee that A and B are processed by two different threads, so that when A stalls for B the thread processing A will block until B makes enough progress and can provide the needed resource to B.

Dynamic task queues provide exactly such deadlock-free way of scheduling potentially dependent tasks on the same queues. They guarantee that every task is processed by a separate thread. Since the amount of tasks that can be scheduled at the same time is not known in advance, dynamic task queues maintain a dynamic thread pool that grows when the workload increases and shrinks when the workload dries off.

Dynamic task queues can not (yet) be used via the DDI interfaces. Some kernel subsystems use the internal taskq calls directly to create and use dynamic task queues. The system also maintains one shared dynamic task queue called system_taskq. It can be used by specifying system_taskq as the taskq argument to the taskq_dispatch() function. It is really a good idea to also add "TQ_NOSLEEP | TQ_NOQUEUE" to the flags when using system_taskq.

Implementation Notes

Each taskq is implemented as a list of tasks protected by a per-taskq lock. One or more worker threads take tasks one by one and execute them by calling f(a) and then sleep, waiting for new entries. A taskq created with a single servicing thread has an important property: it guarantees that all its tasks are executed in the order they are scheduled. When a task queue is created with several servicing threads, task execution order is not predictable.

If you want to look at the actual implementation you need to look at the following files:

History

The first taskq implementation was done by Jeff Bonwick for Solaris 8. It was successfully used to replace many calls to the low-level thread_create() function. I added Dynamic Task Queues in Solaris 9 and used them to completely re-implement the STREAMS scheduler. In Solaris 10 I added DDI interfaces for task queues and also added kstat counters and DTrace probes.


Footnotes:

1 For curious minds: the callout_taskq is used to handle system timers. As an exercise in your DTrace skills, try to figure out what actual timers are firing on each CPU. Hint - use the callout-start SDT probe, which has a pointer to the callout_t structure as its sole argument.


Technorati Tag:
Technorati Tag:
Technorati Tag:
Technorati Tag:

( Jun 16 2005, 05:00:22 PM PDT ) Permalink Comments [6]

Calendar

RSS Feeds

Search

Links

Navigation

Referers