« Two Weeks in the... | Main | Deleting 14TB, and a... »
http://blogs.sun.com/csg/date/20070623 Saturday June 23, 2007

Processor Sets, and Pools, and Partitions! Oh My!

Background

Solaris has two user-visible ways to divide up the processors (or cores, or hardware threads...) on a system to different workloads: processor sets, and processor pools.

Processor sets are a pretty straightforward way to manage processors and tasks assigned to those processors. A user creates processor sets manually with the psrset command or the pset syscalls, and gets a set ID back. A processor can be assigned to a set, and a process (or thread within a process) can be bound to a set. The sets dissolve when you reboot.

Processor pools have more flexibility. They are not necessarily bound (from the administrative point of view) to specific processors. You create and name a pool, to which you can bind processes, tasks, projects, or whole zones (though not individual threads). Tasks and projects, by the way, in this context, refer to some abstractions Solaris has for managing sets of processes. Pools can also persistent across reboots, because they are exported to a configuration file read and regenerated by libpool. And multiple semantically separate pools can map to the same sets of processors if desired. These things give a greater degree of flexibility in managing processors as a resource on a machine.

So this is great, we have a very flexible tool for restricting resource usage, and a fine-grained, simple, very precise way to manage resources. What's wrong? Well, at the moment you cannot have both at once. You can use either sets, or pools, and there is a big switch to control which is enabled. Not the most desirable. Processor sets are active by default for legacy support reasons. Even worse, there's a feature of Solaris Zones / Containers which relies on being able to use the pools facility - so by default, on a fresh install of Solaris, you can't use the dedicated-cpu property of zones until you enable pools (thereby disabling sets). I'll recap the history of the two systems as I understand it from Steve Lawrence. In the beginning, there was nothing. Then, processor sets were added, to make it possible to control how much processing power processes could use. With the addition of zones to Solaris, a more flexible way to manage these was desired - and hence pools were born. They were originally implemented entirely in userland, with a library making calls into the processor set API. Eventually to add more flexibility, some of the implementation was moved into the kernel. /dev/pool and /dev/poolctl were added - administration of pools is now done from command line tools making calls into libpool, which performs ioctls on the devices. With the addition of the dedicated-cpu feature of zones, a notion of a temporary pool was introduced. These pools have a property set such that they are not saved specifically across reboots. Instead, when a zone configured to have one or more dedicated cpus is booted, a temporary pool is created. This way the pool's existence is automatically created by booting the zone (and destroyed by halting), independently of the normal pools administration interface - admins only need to worry about the zone.

Because these temporary pools now exist, it's reasonable to reimplement processor sets as just another interface for manual creation of temporary pools, and making pools the default (and only) processor grouping system. Processor sets allow you to do a couple things at a finer level of granularity than pools, but it turns out this isn't a problem because both systems already have a common base - CPU partitions. Yes, a third grouping. But unlike sets and pools, which are administrative abstractions, partitions are actually what results in threads executing where they should - they are what the scheduler enforces. It's fortunate that this is a common base, because the psets framework does all of its manipulation by directly operating on partitions - so with a little extra notification support, the existing code in the sets implementation for things pools don't support (such as binding individual threads to processor groups) can be reused.

And so, my summer project is combining the implementations, and resolving the fuzzy policy questions, such as whether or not the sets API should be able to muck with the groups created by the pools API (the answer to this, and most of these questions, is no, don't grant permission).

Interesting Side-Effects

So I began hacking up the psets implementation, a bit at a time to see how feasible this project was. Initially I only modified the create and destroy operations in the pset API, just to get the basics going. I left the other code either untouched, or knowingly broken, figuring that I'd initially test creation and destruction, leaving the rest contingent upon that. So I built a new kernel, installed on a test machine, and rebooted. It booted just fine, and just after reaching the login prompt - panicked. The stack trace printed before it dumped core indicated that something had called pset_unbind(), the call to After rebooting to single user mode, running savecore and opening the corefile in mdb, I found something interesting. The process running which made the call causing the panic was a Java process, started some time between the multi-user:default and multi-user:server milestones in the boot process. It was a child of one of the processes started by init. But nothing started at boot time should be messing with processor sets - they're not persistent! Suspecting Java, I DTraced an execution of a demo in /usr/demo/java/, and sure enough - running a 'Hello World' Java app made a call into pset_unbind(). It seems that the JVM on Solaris just does this automatically. But it gets more interesting. Psets actually permit bound processes to unbind themselves when conditions are right. Specifically, processes may do this as long as the PSET_NOESCAPE flag is set. Which by default, it is not. So if you use processor sets, and run the JVM inside a bound set of processes - the first thing the JVM will do is jump out of those bindings, free to roam over every processor in the system. Steve and I suspect that this flag was added after the initial implementation, because processes shouldn't always be able to do this, and it was left off by default because some app depended on the behavior... the JVM? Of course if the PSET_NOESCAPE flag is set on a processor set before Java is executed, it will be stuck bound to that set.

And more oddities to come...



Posted by csg [Sun] ( June 23, 2007 12:10 AM ) Permalink | Comments[0]
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed