Saurabh Mishra's Weblog

20060609 Friday June 09, 2006

Latency group (lgroup) in Solaris on NUMA aware machines All of you would have heard about NUMA (Non-uniform-memory-access) machines. I'm going to describe how the memory latency groups (called lgroup in Solaris) are layed out. While working on Multi-CPU binding project, I had to learn these aspects to implement how to choose a lgroup for a thread having least latency from its earlier home lgroup.

This figure below describes how the lgroup structures are layed out on SPARC based NUMA aware machines. The root lgroup (0) is the top most level of the hierarchy having all the resource sets in the system. lgroup id 1, 2 and 3 are having four CPUs each (system board) and are leaf nodes in this case. On sparc, the remote latency from lgroup 2 to 1 or 3 is same i.e they are equidistant having local and remote latency. In Solaris, we have something called lgroup partition load (lpl_t) which represents the leaf-nodes having CPUs and memory. Each cpu_t (CPU struture) will have cpu_lpl. lpl's are also used when CPU partitons are created (processor sets are the best example). There's a global table of lgroups called lgrp_table[]. Each partition will have its lpl's in cp_lgrploads[] (cpupart_t). Both the tables are indexed by lgroup id. A thread will be homed to an lpl with in the CPU partition.


On a 4-way amd64, the lgroup representation is quite interesting as we have local and in remote we have one and two hops. For example psrinfo(1M) revealed this :-
0 on-line since 06/09/2006 06:49:25
1 on-line since 06/09/2006 06:49:31
2 on-line since 06/09/2006 06:49:33
3 on-line since 06/09/2006 06:49:35

Each CPU is a leaf lgroup. The diagram below explains this very well. In the this kind of configuration, we will have non-leaf nodes as 5, 6, 7 and 8 representing resource sets which are one hop away. For example lgroup id 5 is having 1,2,3 (local and one hop away from lgroup 1). The root lgroup id (0) will have everything.

On SPARC, we have two levels of memory hierarchy whereas on 4-way amd64 has three levels of memory hierarchy. 8 way amd64 should have four levels of memory hierarchy. The scheduling of threads starts from it's home lgroup and goes up the hierarchy. For example if the home of a thread (t->t_lpl) is lgroup 1 (CPU 0 is the resource set), then we would first look at CPU 0 and if thread can't run there, then we will look at the parent of lgroup 1 (lpl_parent) which is lgroup 5 having 1,2,3 as resource sets. Same is true when idle thread steals the work from other CPUs. The locality is kept in mind.

The lgroup hierarchical representation is more interesting when there are three hops (for example on a 8-way amd64 box). I'll leave it for next time. Thanks to Jonathan Chew for taking time and explaining all this. I thought it'd be worth to blog about this since it's a bit complex design. (2006-06-09 09:20:01.0) Permalink Comments [3]

Comments:

Thank you ! I hope that you will continue this topic as it is of considerable interest to me. I have been making some very trivial modifications in the lgrpplat.c source file in uts/$platform/os in order to see some messages at boot time during the NUMA discovery process. My hope is to have some useful messages at boot time that inform the sysadmin of nodes discovered and of the latency discovered. I do worry that the high resolution timer may not be functional at such an early stage of boot however.

Posted by Dennis Clarke on June 09, 2006 at 09:54 AM PDT #

gethrtime() comes alive in cbe_init(). These things get called when system is being initialized during the boot with interrupts disabled. The discovery for determining the latencies differ from platform to platform. On AMD box, we either use hardware pagecopy or bcopy() to determine latencies and on sparc, since we only have local and remote latencies, they are fixed. -- Saurabh Mishra

Posted by Saurabh Mishra on June 09, 2006 at 10:18 AM PDT #

Dennis - take a look at the lgrp utilities at http://www.opensolaris.org/os/community/performance/numa/observability/tools/

Posted by Derek Morr on June 09, 2006 at 10:57 AM PDT #

Post a Comment:

Comments are closed for this entry.

Locations of visitors to this page
archives
links
referers