Shared Context and other UltraSPARC T2 processor MMU Features
This is a brief tour around some of the new memory management features in the UltraSPARC T2 processor, there are a number of interesting changes in this area. Here is a brief summary:
Hardware Table Walk
Enables the Memory Management Unit (MMU) hardware to search the Translation Storage Buffers (TSBs) which are the software maintained address translation caches, in the event of a miss on the the Translation Lookaside Buffer (TLB) the hardware address translation cache.
Shared Context
This is an innovative feature which allows the TLB to be used more efficiently by processes using shared memory. We had to make substantial software changes to take advantage of this.
Improved TSB support
There is also better hardware support for the TSBs, in that up to 4 TSBs (that is 4 user and 4 kernel) can be configured. Larger TSBs - up to 256MB – were supported in the UltraSPARC T1 processor but software support has now been added. Previously 2 TSBs were used and mappings were stored in the TSBs according to page size so mappings with a page size less than 4MB were stored in the first TSB and those with a page size greater than or equal to 4MB were stored in the second TSB. The 4 TSB support allowed the 2 additional TSBs to be used to support shared contexts.
Hashed Cache Index
The hashed cache feature eliminates one of the disadvantages of using 4MB and 256MB pages which is a reduction in the number of page colours available, this then leads to increased contention in the L2 cache.
I've been working mainly as part of the team implementing shared contexts software, so I'll describe this in a little more detail.
Shared Context – more detailed description
Well, why do we need contexts to begin with? Imagine multiple processes executing the same program, these will then map the same program segments to the same virtual addresses within each process, the question then is how do we find a match for these addresses in the TLB. Obviously, a straight lookup doesn't work because the same virtual addresses within each process must map to different physical addresses. So, the MMU designers came up with the idea of a context, each process has a unique context number and when we look up an address in the TLB we search for a match on the <va, ctx> pair. So, that's fine surely we can put our feet up and watch daytime TV now. Unfortunately this doesn't work well in the case in which multiple processes actually share memory which is mapped at the same virtual address within each process. This scenario is pretty common with Oracle and other databases. In this case we have all these <va, ctx> mappings in the TLB and here the underlying physical addresses are the same, however the TLB will not notice because each process will have a different context and so each process will have to load its own version of the shared mapping into the TLB. Bad news - this TLB real estate is expensive!
The MMU designers went back to the drawing board and came up with the idea of having more than one context. How is this going to help anybody? Well suppose the MMU hardware searches the TLB for a match on either <va, ctx0> or <va, ctx1> and all the processes which share memory segments that are attached at the same virtual address load these shared mappings into the TLB using ctx1, then we always get a hit on <va, ctx1> and the TLB entries are shared rather than being per-process.
All that is left for the software team to do now is manage this new context register which turns out to be pretty tricky. How do we identify processes which can use shared contexts? There isn't an existing mechanism in Solaris to identify sharing at this level. We found an interesting white paper which considered this problem Improving the Address Translation Performance of Widely Shared Pages; by Yousef A. Khalidi and Madhusudhan Talluri. We defined a Region as a segment attached to an address space with a fixed set of attributes which included the virtual address it was attached at, its size, access permissions and the underlying object it represented. The idea was to represent each of these regions by an integer, the region identifier which could be used in a bitmap. Basically, we end up with two bitmaps, one per process to represent the regions attached to it and one for the Shared Context Domain which represents the regions are common to a number of processes and can be loaded with the shared context. In order to limit the size of the bitmaps required we restricted the scope of the region ids to processes which share the same executable file. Looking at common applications, most sharing took place in this situation. This gave rise to the Shared Region Domain which is the data structure which is used to manage the regions associated with a particular executable file.
One of the ideas in the white paper which proved particularly useful was shared hme entries. Intimate Shared Memory (ISM) already supports shared hme entries but has its own specific mechanism to do this, other methods of sharing memory duplicate hme entries so each process has its own copy. The implementation of Regions and Shared Region Domains support allows the sharing of hme entries to be used more widely. We also implemented shared mappings in TSBs for mappings which are part of a Shared Context Domain. So, a process which is part of a Shared Context Domain will have 2 private TSBs and 2 TSBs which are shared with all the other processes in the Domain.
( Oct 09 2007, 01:10:06 PM PDT ) Permalink

