One clue to the complexity of using the many counters on the various CPUs is that each of the Solaris counter-based performance measurement tools (for example collect/analyzer, cputrack, and cpustat) all include the following footnote in their respective help messages:
-
See Chapter 10 of the "BIOS and Kernel Developer's Guide for the
Athlon 64 and AMD Opteron Processors", AMD publication #26094.
This document (and its revision, AMD publication #25759) explain that some of the performance counters use a unit mask which further specifies or qualifies the event. In the particular case of data cache refills, the unit mask specifies exactly which kind of refills are being counted, as described in the following table:
| 0x01 | Refill from system memory |
| 0x02 | Refill from Shared-state line from L2 cache |
| 0x04 | Refill from Exclusive-state line from L2 |
| 0x08 | Refill from Owned-state line from L2 |
| 0x10 | Refill from Modified-state line from L2 |
The problem for the naive user is that the default mask is 0x0, which means that no events are selected (and thus the counts will always be zero). The performance tools would be more user friendly if they warned that a counter was being monitored which could not possibly return any useful data (since the associated unit mask is clear). I presume they don't attempt this because of the complexity of tracking the quirks of many different supported CPU's.
To see the problem, consider the following command and output:
$ cputrack -c DC_refill_from_L2 application time lwp event pic0 1.015 1 tick 0 2.015 1 tick 0 2.178 1 exit 0However, by specifying the unit mask (in this example, the union of all of the "refill from L2" flags), it becomes:
$ cputrack -c DC_refill_from_L2,umask=0x1e application time lwp event pic0 1.028 1 tick 47981 2.018 1 tick 47225 2.144 1 exit 101299
The problem is the same for collect/analyzer, but the syntax for specifying the unit mask is slightly different. As the documentation explains, it uses the hardware counter syntax:
-
counter_name[~attribute=value]
collect -h DC_refill_from_L2~umask=0x1e,hi applicationThis issue isn't a problem for most uses of collect which use the well known profiling counters like: cycles, insts, icm, etc; however, you need to pay attention when using the list of CPU-specific flags.