Darryl Gove's blog
Reading the %tick counter
It's often necessary to time the duration of a piece of code. To do this I often use gethrtime(), which is typically a quick call. Obviously, the faster the call more accurate I can expect my timing to be. Sometimes the call to gethrtime is too long (or perhaps too costly, because it's done so frequently). The next thing to try is to read the %tick register.
The %tick register is a 64-bit register that gets incremented on every cycle. Since it's stored on the processor reading the value of the register is a low cost operation. The only complexity is that being a 64-bit value it needs special handling under 32-bit codes where the a 64-bit return value from a function is passed in the %o0 (upper bits) and %o1 (lower bits) registers.
The inline template to read the %tick register in a 64-bit code is very simple
.inline tk,0 rd %tick,%o0 .end
The 32-bit version requires two more instructions to get the return value into the appropriate registers:
.inline tk,0 rd %tick,%o2 srlx %o2,32,%o0 sra %o2,0,%o1 .end
Here's an example code which uses the %tick register to get the current value plus an estimate of the cost of reading the %tick register:
#includelong long tk(); void main() { printf("value=%llu duration=%llu\n",tk(),-tk()+tk()); }
The compile line is:
$ cc -O tk.c tk.il
The output should be something like:
$ a.out value=4974674895272956 duration=32
Indicating that 32 cycles (in this instance) elapsed between the two reads of the %tick register. Looking at the disassembly, there are certainly a number of cycles of overhead:
10bbc: 95 41 00 00 rd %tick, %o2
10bc0: 91 32 b0 20 srlx %o2, 32, %o0
10bc4: 93 3a a0 00 sra %o2, 0, %o1
10bc8: 97 32 60 00 srl %o1, 0, %o3
10bcc: 99 2a 30 20 sllx %o0, 32, %o4
10bd0: 88 12 c0 0c or %o3, %o4, %g4
10bd4: c8 73 a0 60 stx %g4, [%sp + 96]
10bd8: 95 41 00 00 rd %tick, %o2
This overhead can be reduced by treating the %tick register as a 32-bit read, and effectively ignoring the upper bits. For very short duration codes this is probably acceptable, but is unsuitable for longer running code blocks. With this (inelegant hack) the following code is generated:
10644: 91 41 00 00 rd %tick, %o0
10648: b0 07 62 7c add %i5, 636, %i0
1064c: b8 10 00 08 mov %o0, %i4
10650: 91 41 00 00 rd %tick, %o0
Which returns usually a value of 8 cycles on the same platform.
Posted at 03:56PM Jul 11, 2008 by Darryl Gove in Sun |


