Weblog

All | CMT | General | NUMA | OpenSolaris | Perl | Photo | Programmers Desk | STREAMS
« Markup quest | Main | Markup quest followu... »
20070619 Tuesday June 19, 2007

CPU Caps: thread accounting

CPU Caps: thread accounting

The CPU Caps project is now integrated in OpenSolaris and now I am working on back-porting it to S10 update. I think this is a good time to put some notes regarding its implementation details. The Implementation guide gives a good high-level overview, so here I'd like to concentrate on the bottom-up view.

Before we can penalize CPU usage of some threads we need to know how much CPU is consumed by every project. There are two main approaches available - sampling and monitoring. The difference can be illustrated using the freeway speed control example. Imagine that local police decided to crack down on the freeway speeders (in fact, this is exactly what happened in San Jose). The common method is to hide police vehicle in the bushes and wait until some unlucky schmuck races by. The chances of getting every speeder are not very high but over some period of time the method works because enough speeders would be eventually caught. Some lucky ones, though, would miss a chance meeting with a friendly policeman.

Another approach is to tag each car when it enters and exits freeway with the location and the time. Assuming that the speed is more or less constant it is easy to calculate the average speed and penalise speeding car when it exits the freeway. This method provides for much greater accuracy.

Using sampling we can periodically check what threads are running on each CPU and interpolate their CPU usage from that. For example, once every clock tick we find all threads running on a CPU and charge them 1 clock tick worth of CPU time. Some threads may have just arrived on CPU while others may be sitting there longer, but for long-running threads we should get a good enough estimation. This is the simplest approach and it was used in the initial CPU Caps prototype. The main trouble is that one tick is quite a long time on modern super-fast CPUs and a lot of thread activity may happen in the meantime.

Thread monitoring allows us to know exactly how much CPU time was consumed by a CPU. We do this by marking the time a thread boarded a CPU and left it. Since we are only interested in short-turn CPu usage (over a tick) we also need to check those running on a CPU and get their on-CPU time as well.

Solaris kindly provides us a convenient tool for such purposes, called micro-state accounting. It uses very accurate nanosecond-granularity timestamps whenever thread changes its states. The CPU Caps code uses this facility to calculate CPU usage of each thread. This is done by the mstate_thread_onproc_time() routine:


mstate_thread_onproc_time(kthread_t *t)
{
	hrtime_t aggr_time;
	hrtime_t now;
	hrtime_t state_start;
	struct mstate *ms;
	klwp_t *lwp;
	int	mstate;

        /* Ignore kernel threads */
	if ((lwp = ttolwp(t)) == NULL)
		return (0);

        /* Get the current thread state */
	mstate = t->t_mstate;
	ms = &lwp->lwp_mstate;
        /* time when thread entered this state */
	state_start = ms->ms_state_start;

        /* Thread's user + system + trap time */
	aggr_time = ms->ms_acct[LMS_USER] +
	    ms->ms_acct[LMS_SYSTEM] + ms->ms_acct[LMS_TRAP];

        /* current time */
	now = gethrtime_unscaled();

	/*
	 * NOTE: gethrtime_unscaled on X86 taken on different CPUs is
	 * inconsistent, so it is possible that now < state_start.
	 */
	if ((mstate == LMS_USER || mstate == LMS_SYSTEM ||
		mstate == LMS_TRAP) && (now > state_start)) {
                        /* Add time spent on CPU in the current state */
			aggr_time += now - state_start;
	}

	scalehrtime(&aggr_time);
	return (aggr_time);
}

This function returns the time spent on CPU by user-land threads since their birth. The t->t_lwp->lwp_mstate.ms_acct array contains aggregate time spent by thread in each of the possible states:

The function above is the foundation of the thread accounting done by CPU caps. The CPU-caps specific monitoring is implemented by each scheduling class which supports CPU caps. For each thread scheduling classes keep a little structure which contains the total time spent on CPU during thread lifetime. Every time thread leaves a CPU and also every tick, scheduling class code calls caps_charge_adjust function via the cpucaps_charge. The caps_charge_adjust function calculates the time spent on CPU since a thread was last checked and updates its total on-CPU time. We will take a closer look at it next time.


[ Technorati: ]

( Jun 19 2007, 06:16:28 PM PDT ) Permalink

Comments:

Post a Comment:

Comments are closed for this entry.

Calendar

RSS Feeds

Search

Links

Navigation

Referers