NUMA.2 Node Affinity
So, following NUMA.1, here's the second part of the series about NUMA architectures.
Node Affinity
NUMA systems have a shared memory space composed of every node's individual memory. The total physical memory is the sum of each node's individual memory, which is consumed by the operating system as it allocates space for itself and user applications.
Every process has its own address space, composed of virtual addresses that are mapped to physical ones. This means, among other things, that a process has no idea of where, in physical memory, it is allocated. It could be contiguous at a single node or spread out among every node.
In the latter, a process would have different latency times when accessing its own memory positions. For instance, if a process is created at the first node and, as it grows, starts to allocate memory at the second node. Accessing this newly allocated memory means going through the interconnect, which takes longer than accessing local memory positions and causes app performance to decrease.
Ideally, we'd like to avoid remote memory accesses. In other words, allocate memory at the node in which the process was originally created so that its entire address space sits at a single node. Or if not possible, try to allocate memory at the nearest node to minimize access times.
By doing this, the system is respecting the process' affinity to its original node.
Node affinity is important both for single and multi threaded processes. The earlier example used a process with a single execution flow (thread). For multi threaded apps, the story is a little bit different.
If your MT app creates n threads to execute independently, we would like to have each thread running at a different processing unit and fully utilize the system's nodes - as long as the current system load allows us.
But if those threads rely heavily on synchronization or data sharing, spreading them around the system will increase remote accesses. In such case, we'd like to have a balance between local accesses and available cpu power. Again, it's important to maintain node affinity but load balancing is also something to be on the lookout for.
Next time, I'll write about how OpenSolaris and Linux represent the system topology according to latency times and how the scheduler and the VM subsystems optimize for node latency.