Sun Sensible
Innovative Performance Ideas from Nicolai Kosche

20050915 Thursday September 15, 2005

Dataspace Profiling Knowledge – Look Inside the Machine!

With Insight and Experience, Knowledge drives Change. For peak productivity, you need Action through Knowledge.

Let’s Look Inside the Machine!

For Dataspace Profiling, this table expresses the Machine, application and hardware:

Software View

Hardware View

Software Execution

Program Source

Virtual Memory

Physical Memory

Cache Hierarchy

Execution Units

Time

 

 

Process

Memory Board

 

Processor Board

Hours

 

Load Object

Segments

Memory Bank*

Cache Bank

Processor

Minutes

 

Function

Virtual Page

Physical Page

TLB Page Entry

Core

Seconds

Thread

Instruction

Virtual Ecache Line

Physical ECache Line

External Cache Line

Strand

 

 

 

Virtual ECache Sub-Block

Physical ECache Sub-Block

ECache Sub-Block

 

 

 

Data Object

Virtual L1 Cache Line

Physical L1 Cache Line

Level One Cache Line

 

 

The first row shows the Software View and the Hardware View groupings of various Categories of Machine Resources: Software Execution, Program Source, Virtual Memory allocated by the Software, Physical Memory of the Hardware caching portions of the Virtual Memory, the Cache Hierarchy caching portions of the Physical Memory, the Execution Units, and Time.

Stacked Cells in Columns represent nested Collections of Structures. For example, Load Objects comprise Functions that comprise Instructions. Similarly, Processor Boards comprise Processors that comprise Cores that comprise Strands.

The Cache Hierarchy contains the MMU that contains TLB Translation Entries that map a Virtual Page to a Physical Page. The External Cache comprises External Cache Lines that may contain External Cache Sub-Blocks. The Level One Cache comprises Level One Cache Lines. To simplify this explanation, assume that the machine has only one size for every one of the Cache Hierarchy Components; however, Dataspace Profiling handles an arbitrary number of sizes.

Each Color shows an association among Categories of Machine Resources. For example, A TLB Entry will map Virtual Memory Pages to Physical Memory Pages. Dataspace Profiling manages this association for you with the Object Definitions. When a cost occurs in filling the TLB Entry, Dataspace Profiling automatically manages which Virtual Memory Page and which Physical Memory Page was affected.

The Colors let you gain Insight from one Machine Category to another. Filter on one Category, and change Perspective to the corresponding Category of the same Color. Filtering among Categories enables you narrow down a bottleneck.

In my example, one Category is the Program View of Memory (Virtual Page), another is the Hardware View of Memory (Physical Page), and the third is the Hardware View for the Cache Hierarchy, the TLB Entry. List the Virtual Pages using this TLB Entry? (Filter by TLB Entry, view by Virtual Pages.) Which Physical Page is mapped by this Virtual Page? (AND Filter by the Virtual Page, view by Physical Page).

For another example, observe your Application from the Perspective of Processor Boards. For the Processor Board using the most time, which Memory Boards does this Processor Board use? Note the Colors above, we can find out by filtering on the Processor Board; and changing Perspective to the Memory Boards. Dataspace Profiling provides you the answer!

You can continue to gain Knowledge about your Application by repeating this process, drilling deeper. By Filtering by a Memory Board AND a Processor Board, you gain Knowledge of which other objects in the machine are using both.

For example, repeated AND Filtering allows powerful observations. From which Threads inside which Processes running on what Processors, using which Type Definitions and which Virtual Memory Addresses that were placed on which remote Memory Board, and at what Time during the Application execution.

In my previous blog entry, I walked you through a series of Insights gained through Dataspace Profiling Technology. Now you gain the Knowledge whether the Virtual Addresses are heavily shared (many Processes with one Virtual Memory per Physical Memory or Cache Structure), or they are falsely shared and we have a conflict (many Virtual Memories to one Physical Memory; or many Physical Memories to one Cache Structure). We know which Virtual Addresses, Virtual Pages, Processes, Threads, User-Defined Structures, Functions are affected. You now have the Knowledge. You can Act through Knowledge.

Dataspace Profiling – Look Inside the Machine!

( Sep 15 2005, 01:26:42 PM PDT ) Permalink Comments [3]

20050913 Tuesday September 13, 2005

Dataspace Profiling Insight – Look Inside the Machine!

Gain Insight from Multiple Perspectives. Dataspace Profiling Insight dissects performance bottlenecks within the Perspective they are visible. Then, the bottleneck in its entirety can be observed from any other Perspective. The correlation between one Perspective to another offers Insight.

Let’s walk through an example of an Application with a scaling issue.

From the Perspective of L2 Cache, Dataspace Profiling observes:

Select the offending L2 Cache Line entry, and Filter to “Look Inside the Machine” just from the Perspective of this one L2 Cache Line.

To gain Insight, we observe the L2 Cache Line in its entirety from the Perspective of the Virtual Addresses used by the Application:

We gain the Insight, that just four neighboring Virtual Addresses incur the most cost to that one L2 Cache Line.

To gain more Insight, we observe these four Virtual Addresses when they occupy the L2 Cache Line in their entirety:

Now observed from the Perspective of the Processes used by the Application:

We have more Insight. One L2 Cache Line is used by many Processes of the Application and by just four neighboring Virtual Addresses.

We observe the Segment Profile and see the Virtual Memory Segment using 64k Sized Pages. More Insight!

One L2 Cache Line is used by:

  • Many Processes of the Application;
  • Four neighboring Virtual Addresses;
  • One 64k-sized Page.

As I’ve shown you in my previous blog entry, you can have any Perspective you want.

With Dataspace Profiling, you gain Insight into your Application from the Program View and Hardware View. In the Program View, you gain Insight into the Functions, Threads, Type Definitions and Virtual Memory Allocations. In the Hardware View, you gain Insight into the Physical Memory Placement, the Cache Hierarchy Utilization, Memory Management Unit Utilization, Execution Unit Utilization, and even Time.

With Dataspace Profiling Insight, scaling issues are self-evident.

Dataspace Profiling – Look Inside the Machine!

( Sep 13 2005, 05:45:06 PM PDT ) Permalink Comments [0]

20050908 Thursday September 08, 2005

Dataspace Profiling Perspectives – Look Inside the Machine!

Dataspace Profiling enhances observability by providing additional perspectives into the costs associated with your application. Traditional profiling tools usually provide a function view of your program cost:

 
Excl. User CPU  Excl. Max.       Name  
                Mem. Stall             
   sec.      %     sec.      %    
849.494 100.00  799.249 100.00   
813.539  95.77  794.826  99.45   test
 31.762   3.74    3.643   0.46   find_free_slot
  3.342   0.39    0.280   0.04   go_test
  0.460   0.05    0.460   0.06   foo_fval
  0.180   0.02    0.040   0.01   main
  0.120   0.01    0.      0.     linkcnt
  0.030   0.00    0.      0.     memset
  0.030   0.00    0.      0.     rand
  0.020   0.00    0.      0.     use_free_slot
  0.010   0.00    0.      0.     ok

and also an instruction view of costs associated with your application:

 
Excl. User CPU  Excl. Max. 
                Mem. Stall             
   sec.      %     sec.      %
   3.522   0.4   0.      0.             [ 43] 100003b4c:  sllx        %o3, 2, %g2
   0.390   0.0   0.      0.             [ 43] 100003b50:  ld          [%g3 + %g2], %o3
## 3.623   0.4   3.623   0.5            [ 43] 100003b54:  add         %g2, %g3, %o4
## 6.545   0.8   0.      0.             [ 43] 100003b58:  xnorcc      %o3, 0, %g0
   0.      0.    0.      0.             [ 43] 100003b5c:  be,pn       %icc,0x100003bf0
   0.010   0.0   0.      0.             [ 43] 100003b60:  cmp         %o1, 32

As computer systems are more dependent on the Memory Subsystem, additional perspectives of costs enhance the diagnostic ability of Dataspace Profiling. I group these perspectives into two large categories: Hardware View and Program View.

Hardware View

Hardware View Perspectives include the costs from the Memory Subsystem and the Execution Units.

Costs from the components of the Memory Subsystem itself are costs from the Cache Hierarchy and costs from the Memory Topology of your computer system. For example, the Memory Subsystem costs from the L2 Cache Line perspective:

In this view, we observe the distribution of costs within the lines of the L2 Cache.

Additional Hardware View Perspectives include the Memory Subsystem costs from the Execution Unit Perspective. For example, costs from each Physical Processor:

In these cases, we observe how the operating system scheduled our application on the underlying hardware.

Dataspace Profiling observes your application from the perspective of time. How Memory Subsystem costs changed over time:

Program View

Dataspace Profiling observes an application Program View from the perspective of both Program Source and Program Address Space.

Program Source includes the function source profiled in current tools, and Dataspace Profiling includes the costs of program type definitions and their constituents:

and the profile of a user-defined type by definition order:

Program Address Space profiling includes all the allocations in your application. Dataspace Profiling reports costs from the perspective of Address Segments, Virtual and Physical Pages, etc.

Dataspace Profiling provides you detailed observation of your application from every perspective.

Look Inside the Machine!

( Sep 08 2005, 02:48:03 PM PDT ) Permalink Comments [0]

20050906 Tuesday September 06, 2005

Dataspace Profiling - Look Inside the Machine!

Computer systems originally contained a central processing unit encompassing many boards (and sometimes cabinets), and random access memory that responded in the same cycle time as the central processing unit. This central processing unit (or CPU as we know today) was very costly.

Initially, bulbs attached to wires within the CPU aided programmer deduction in the identification of program behavior, in order to save precious processor time. These were the early profiling tools.

Computer languages, such as FORTRAN and COBOL, improved programmer productivity. Profiling libraries followed to breakdown the cost of the most precious resource on the system: the processor. Profiling associated processor costs with processor instructions and the source representation of those instructions: functions and line numbers. Programmer productivity climbed, as critical central processor unit bottlenecks were uncovered and resolved in program source.

Computers continued to evolve: the central processor unit shrank down to a single board, the mainframe led to the minicomputer, and multiple processor systems appeared. A disruptive technology, the microprocessor, appeared around this time. These cheap microprocessors were mass-produced with large-scale integration (LSI) and later VLSI.

Initially, microprocessors were inept compared to central processor units; but mass production laid the death knell for the discrete-logic central processor unit.

The "killer micro" debate raged in this time. Large numbers of cheap commodity microprocessors grouped to solve large problems only possible with mainframes. Sun offered a wide array of microprocessor-based systems, some more powerful than the largest mainframes of the day.

We are now in the mid-1990s. The acquisition costs of microprocessors comprised of a small fraction of overall system cost. The bulk of system cost was the memory subsystem (the cabinet, the interconnect, the controllers, the DRAM chips), and the peripherals.

In software, Solaris engineers were solving complex operating system scaling issues through deduction. In one case, through inference, an engineer found that one hot cache line bottlenecked all the microprocessors in the system. This brilliance in deduction was not missed; I realized we needed a tool to identify the now-new critical resource: the Memory Subsystem. This inflection point in technology drove me to invent Dataspace Profiling.

UltraSPARC-III was the first Sun processor that added support to monitor Memory Subsystem behavior. I worked with every processor team since then to include adequate support for Dataspace Profiling: the now-defunct Millennium processor; Niagara processors; UltraSPARC-III, IIIi and IV-based processors; and ROCK processors.

Computers evolved further: chip-multithreaded (CMT) processors have many Cores driving even more virtual processor Strands of instruction execution. These CMT processors offer fewer Memory Subsystem components than Strands of instruction execution. The performance-critical component in these systems is often the Memory Subsystem and not the Strands of execution.

Traditional profiling tools fail to detect these bottlenecks. Traditional profiling tools persist in monitoring the Processor Core, when the bottleneck is in the Memory Subsystem.

Dataspace Profiling monitors the both the Processor Core and the Memory Subsystem to identify machine bottlenecks and relates the solution back to Program Source and Program Address Space. All machine components are profiled with low-intrusion, on-the-fly, and related back visually to any program source and any program memory object.

Today, thanks to Performance Teams, Processor Teams, Compiler Teams, Solaris Teams, and the Sun Studio Analyzer Team, we are ready. Sun has incorporated Dataspace Profiling in Sun Studio.

Welcome aboard the experience of Dataspace Profiling: Look Inside the Machine!

( Sep 06 2005, 09:01:35 AM PDT ) Permalink Comments [0]


Archives
Language
Links
Referrers




(c) 2005 Sun Microsystems, Inc. All rights reserved.