Following NUMA.2 and NUMA.1, here's one about how OpenSolaris represents NUMA architectures.
Topology Representation
Quick recap.
A NUMA machine is composed of nodes that contain some kind of hardware resource: processors, memory, devices.. These nodes are connected through an interconnect hardware that allows each node to access every other node's memory transparently, forming a single shared memory space. Every node can access the entire memory space, but because they have to go through the interconnect to access remote ones, accessing local memory is faster than accessing remote memory.
The OS needs to be aware of this situation and know exactly where, in physical memory, a node ends and another one begins. OpenSolaris uses a kernel abstraction called locality groups - or simply lgroups - to represent sets of resources within some distance of each other. Lgroups are created during system boot and form a latency topology used by the scheduler/dispatcher and the VM subsystems to allocate resources properly. This topology is hierarchical, containing lower latencies at the leafs and higher access latencies at the root. Let's look at an example:
![]() | ![]() |
| (a) 4 node machine (ring) | (b) leaf lgroups |
The example above show that a four node machine with a ring topology will have four lgroups, one for each node. These are leaf lgroups, they will be at the lowest level of the hierarchy because they represent only local accesses.
Remote access will increase latencies depending on how far you're going. If it's nodes one hop away, it's one thing. If we're going to the furthest node, it's another. The system creates intermediate lgroups to represent local and well, intermediate distances (in this case, one hop away). It also creates a root lgroup, that contains all the resources in the system, and represents the highest level of latencies.
![]() | ![]() |
| (c) intermediate lgroup around node 0 | (d) root lgroup |
Figure (c) shows the intermediate lgroup 5 formed around node 1, it contains lgroups 1, 2 and 4.
This might seem a bit confusing without looking at the whole topology, and how the system uses it.
![]() |
| (e) lgroup topology |
This topology is hierarchical, as mentioned earlier. We have the lowest latencies at the bottom, and the highest at the top. The scheduler/dispatcher and the VM subsystems consult this topology to move threads around and to allocate memory, respectively.
The system will try to allocate resources (CPU, memory) at the lgroup in which the thread is located. If it can't get the resources there, it will move up on the hierarchy and consider the next closest resources. So it will first consider the "neighboring" nodes and if that still doesn't work, it will consider the entire system's resources - represented by the root lgroup.
The idea behind this coarse of action is to maintain threads and their resources as close as possible. This results in lower access times and takes advantage of cache warmth when possible.
Next time, I'll write about load balancing and processor partitions, and how they fit into all of these.






