One clue to the complexity of using the many counters on the various CPUs is that each of the Solaris counter-based performance measurement tools (for example collect/analyzer, cputrack, and cpustat) all include the following footnote in their respective help messages:
-
See Chapter 10 of the "BIOS and Kernel Developer's Guide for the
Athlon 64 and AMD Opteron Processors", AMD publication #26094.
This document (and its revision, AMD publication #25759) explain that some of the performance counters use a unit mask which further specifies or qualifies the event. In the particular case of data cache refills, the unit mask specifies exactly which kind of refills are being counted, as described in the following table:
| 0x01 | Refill from system memory |
| 0x02 | Refill from Shared-state line from L2 cache |
| 0x04 | Refill from Exclusive-state line from L2 |
| 0x08 | Refill from Owned-state line from L2 |
| 0x10 | Refill from Modified-state line from L2 |
The problem for the naive user is that the default mask is 0x0, which means that no events are selected (and thus the counts will always be zero). The performance tools would be more user friendly if they warned that a counter was being monitored which could not possibly return any useful data (since the associated unit mask is clear). I presume they don't attempt this because of the complexity of tracking the quirks of many different supported CPU's.
To see the problem, consider the following command and output:
$ cputrack -c DC_refill_from_L2 application time lwp event pic0 1.015 1 tick 0 2.015 1 tick 0 2.178 1 exit 0However, by specifying the unit mask (in this example, the union of all of the "refill from L2" flags), it becomes:
$ cputrack -c DC_refill_from_L2,umask=0x1e application time lwp event pic0 1.028 1 tick 47981 2.018 1 tick 47225 2.144 1 exit 101299
The problem is the same for collect/analyzer, but the syntax for specifying the unit mask is slightly different. As the documentation explains, it uses the hardware counter syntax:
-
counter_name[~attribute=value]
collect -h DC_refill_from_L2~umask=0x1e,hi applicationThis issue isn't a problem for most uses of collect which use the well known profiling counters like: cycles, insts, icm, etc; however, you need to pay attention when using the list of CPU-specific flags.
Thanks for the explanation. Both the problem and the syntax to fix it were not the least bit obvious. -jj
Posted by J.J. Hillis on September 04, 2008 at 04:28 PM PDT #
The tools should tell you if the flags are set so that you can't possibly get any useful information.
Posted by Heidi Miller on September 26, 2008 at 09:00 AM PDT #
Just as a late followup on this - Solaris (as of Nevada build 101) now has what we've called "generic events". Each platform defines a set of well known events such as total instruction, total cycles, l1/l2 d$ miss, dtlb miss etc. We used the PAPI projects "preset events" naming scheme as it seemed to be fairly well accepted and seems to have had a lot of thought put into it. See the generic_events(3CPC) man page for the total set of events and, also, the events each platform implements.
Posted by Jon Haslam on January 09, 2009 at 11:37 AM PST #