CPU Performance Counters Library Functions cpcbindevent(3CPC)
NAME
cpcbindevent, cpctakesample, cpcrele - use CPU perfor-
mance counters on lwps
SYNOPSIS
cc [ flag... ] file... -lcpc [ library... ]
#include
int cpcbindevent(cpceventt *event, int flags);
int cpctakesample(cpceventt *event);
int cpcrele(void);
DESCRIPTION
Once the events to be sampled have been selected using, for
example, cpcstrtoevent(3CPC), the event selections can be
bound to the calling LWP using cpcbindevent(). If
cpcbindevent() returns successfully, the system has asso-
ciated performance counter context with the calling LWP. The
context allows the system to virtualize the hardware
counters to that specific LWP, and the counters are enabled.
Two flags are defined that can be passed into the routine to
allow the behavior of the interface to be modified, as
described below.
Counter values can be sampled at any time by calling
cpctakesample(), and dereferencing the fields of the
cepic[] array returned. The cehrt field contains the
timestamp at which the kernel last sampled the counters.
To immediately remove the performance counter context on an
LWP, the cpcrele() interface should be used. Otherwise, the
context will be destroyed after the LWP or process exits.
The caller should take steps to ensure that the counters are
sampled often enough to avoid the 32-bit counters wrapping.
The events most prone to wrap are those that count processor
clock cycles. If such an event is of interest, sampling
should occur frequently so that less than 4 billion clock
cycles can occur between samples. Practically speaking, this
is only likely to be a problem for otherwise idle systems,
or when processes are bound to processors, since normal con-
text switching behavior will otherwise hide this problem.
SunOS 5.11 Last change: 02 Mar 2007 1
CPU Performance Counters Library Functions cpcbindevent(3CPC)
RETURN VALUES
Upon successful completion, cpcbindevent() and
cpctakesample() return 0. Otherwise, these functions
return -1, and set errno to indicate the error.
ERORS
The cpcbindevent() and cpctakesample() functions will
fail if:
EACES For cpcbindevent(), access to the requested
hypervisor event was denied.
EAGAIN Another process may be sampling system-wide CPU
statistics. For cpcbindevent(), this implies
that no new contexts can be created. For
cpctakesample(), this implies that the perfor-
mance counter context has been invalidated and
must be released with cpcrele(). Robust programs
should be coded to expect this behavior and
recover from it by releasing the now invalid con-
text by calling cpcrele() sleeping for a while,
then attempting to bind and sample the event once
more.
EINVAL The cpctakesample() function has been invoked
before the context is bound.
ENOTSUP The caller has attempted an operation that is
illegal or not supported on the current platform,
such as attempting to specify signal delivery on
counter overflow on a CPU that doesn't generate
an interrupt on counter overflow.
USAGE
Prior to calling cpcbindevent(), applications should call
cpcaccess(3CPC) to determine if the counters are accessible
on the system.
EXAMPLES
Example 1 Use hardware performance counters to measure
events in a process.
The example below shows how a standalone program can be
instrumented with the libcpc routines to use hardware per-
formance counters to measure events in a process. The pro-
gram performs 20 iterations of a computation, measuring the
counter values for each iteration. By default, the example
SunOS 5.11 Last change: 02 Mar 2007 2
CPU Performance Counters Library Functions cpcbindevent(3CPC)
makes the counters measure external cache references and
external cache hits; these options are only appropriate for
UltraSPARC processors. By setting the PERFEVENTS environment
variable to other strings (a list of which can be gleaned
from the -h flag of the cpustat or cputrack utilities),
other events can be counted. The error() routine below is
assumed to be a user-provided routine analogous to the fami-
liar printf(3C) routine from the C library but which also
performs an exit(2) after printing the message.
#include
#include
#include
#include
#include
int
main(int argc, char *argv[])
{
int cpuver, iter;
char *setting = NUL;
cpceventt event;
if (cpcversion(CPCVERCURENT) != CPCVERCURENT)
error("application:library cpc version mismatch!");
if ((cpuver = cpcgetcpuver()) == -1)
error("no performance counter hardware!");
if ((setting = getenv("PERFEVENTS")) == NUL)
setting = "pic0=ECref,pic1=EChit";
if (cpcstrtoevent(cpuver, setting, &event) != 0)
error("can't measure '%s' on this processor", setting);
setting = cpceventtostr(&event);
if (cpcaccess() == -1)
error("can't access perf counters: %s", strerror(errno));
if (cpcbindevent(&event, 0) == -1)
error("can't bind lwp%d: %s", lwpself(), strerror(errno));
for (iter = 1; iter <= 20; iter]) {
cpceventt before, after;
if (cpctakesample(&before) == -1)
break;
/* ==> Computation to be measured goes here <== */
if (cpctakesample(&after) == -1)
break;
SunOS 5.11 Last change: 02 Mar 2007 3
CPU Performance Counters Library Functions cpcbindevent(3CPC)
(void) printf("%3d: %" PRId64 " %" PRId64 "0, iter,
after.cepic[0] - before.cepic[0],
after.cepic[1] - before.cepic[1]);
}
if (iter != 20)
error("can't sample '%s': %s", setting, strerror(errno));
free(setting);
return (0);
}
Example 2 Write a signal handler to catch overflow signals.
This example builds on Example 1, but demonstrates how to
write the signal handler to catch overflow signals. The
counters are preset so that counter zero is 1000 counts
short of overflowing, while counter one is set to zero.
After 1000 counts on counter zero, the signal handler will
be invoked.
First the signal handler:
#define PRESET0 (UINT64MAX - UINT64C(999))
#define PRESET1 0
void
emthandler(int sig, siginfot *sip, void *arg)
{
ucontextt *uap = arg;
cpceventt sample;
if (sig != SIGEMT sip->sicode != EMTCPCOVF) {
psignal(sig, "example");
psiginfo(sip, "example");
return;
}
(void) printf("lwp%d - siaddr %p ucontext: %%pc %p %%sp %p0,
lwpself(), (void *)sip->siaddr,
(void *)uap->ucmcontext.gregs[PC],
(void *)uap->ucmcontext.gregs[USP]);
if (cpctakesample(&sample) == -1)
error("can't sample: %s", strerror(errno));
(void) printf("0x%" PRIx64 " 0x%" PRIx64 "0,
SunOS 5.11 Last change: 02 Mar 2007 4
CPU Performance Counters Library Functions cpcbindevent(3CPC)
sample.cepic[0], sample.cepic[1]);
(void) fflush(stdout);
sample.cepic[0] = PRESET0;
sample.cepic[1] = PRESET1;
if (cpcbindevent(&sample, CPCBINDEMTOVF) == -1)
error("cannot bind lwp%d: %s", lwpself(), strerror(errno));
}
and second the setup code (this can be placed after the code
that selects the event to be measured):
struct sigaction act;
cpceventt event;
...
act.sasigaction = emthandler;
bzero(&act.samask, sizeof (act.samask));
act.saflags = SARESTARTSASIGINFO;
if (sigaction(SIGEMT, &act, NUL) == -1)
error("sigaction: %s", strerror(errno));
event.cepic[0] = PRESET0;
event.cepic[1] = PRESET1;
if (cpcbindevent(&event, CPCBINDEMTOVF) == -1)
error("cannot bind lwp%d: %s", lwpself(), strerror(errno));
for (iter = 1; iter <= 20; iter]) {
/* ==> Computation to be measured goes here <== */
}
cpcbindevent(NUL, 0); /* done */
Note that a more general version of the signal handler would
use write(2) directly instead of depending on the signal-
unsafe semantics of stderr and stdout. Most real signal
handlers will probably do more with the samples than just
print them out.
ATRIBUTES
See attributes(5) for descriptions of the following attri-
butes:
SunOS 5.11 Last change: 02 Mar 2007 5
CPU Performance Counters Library Functions cpcbindevent(3CPC)
ATRIBUTE TYPE ATRIBUTE VALUE
MT-Level MT-Safe
Interface Stability Obsolete
SEE ALSO
cpustat(1M), cputrack(1), write(2). cpc(3CPC),
cpcaccess(3CPC), cpcbindcurlwp(3CPC),
cpcsetsample(3CPC), cpcstrtoevent(3CPC),
cpcunbind(3CPC), libcpc(3LIB), attributes(5)
NOTES
The cpcbindevent(), cpctakesample(), and cpcrele()
functions exist for binary compatibility only. Source con-
taining these functions will not compile. These functions
are obsolete and might be removed in a future release.
Applications should use cpcbindcurlwp(3CPC),
cpcsetsample(3CPC), and cpcunbind(3CPC) instead.
Sometimes, even the overhead of performing a system call
will be too disruptive to the events being measured. Once a
call to cpcbindevent() has been issued, it is possible to
directly access the performance hardware registers from
within the application. If the performance counter context
is active, then the counters will count on behalf of the
current LWP.
SPARC
rd %pic, %rN ! All UltraSPARC
wr %rN, %pic ! (ditto, but see text)
x86
rdpmc ! Pentium I only
If the counter context is not active or has been invali-
dated, the %pic register (SPARC), and the rdpmc instruction
(Pentium) will become unavailable.
Note that the two 32-bit UltraSPARC performance counters are
kept in the single 64-bit %pic register so a couple of addi-
tional instructions are required to separate the values.
Also note that when the %pcr register bit has been set that
configures the %pic register as readable by an application,
SunOS 5.11 Last change: 02 Mar 2007 6
CPU Performance Counters Library Functions cpcbindevent(3CPC)
it is also writable. Any values written will be preserved by
the context switching mechanism.
Pentium I processors support the non-privileged rdpmc
instruction which requires [5] that the counter of interest
be specified in %ecx, and returns a 40-bit value in the
%edx:%eax register pair. There is no non-privileged access
mechanism for Pentium I processors.
Handling counter overflow
As described above, when counting events, some processors
allow their counter registers to silently overflow. More
recent CPUs such as UltraSPARC I and Pentium I, however,
are capable of generating an interrupt when the hardware
counter overflows. Some processors offer more control over
when interrupts will actually be generated. For example,
they might allow the interrupt to be programmed to occur
when only one of the counters overflows. See
cpcstrtoevent(3CPC) for the syntax.
The most obvious use for this facility is to ensure that the
full 64-bit counter values are maintained without repeated
sampling. However, current hardware does not record which
counter overflowed. A more subtle use for this facility is
to preset the counter to a value to a little less than the
maximum value, then use the resulting interrupt to catch the
counter overflow associated with that event. The overflow
can then be used as an indication of the frequency of the
occurrence of that event.
Note that the interrupt generated by the processor may not
be particularly precise. That is, the particular instruc-
tion that caused the counter overflow may be earlier in the
instruction stream than is indicated by the program counter
value in the ucontext.
When cpcbindevent() is called with the CPCBINDEMTOVF
flag set, then as before, the control registers and counters
are preset from the 64-bit values contained in event. How-
ever, when the flag is set, the kernel arranges to send the
calling process a SIGEMT signal when the overflow occurs,
with the sicode field of the corresponding siginfo struc-
ture set to EMTCPCOVF, and the siaddr field is the program
counter value at the time the overflow interrupt was
delivered. Counting is disabled until the next call to
cpcbindevent(). Even in a multithreaded process, during
execution of the signal handler, the thread behaves as if it
is temporarily bound to the running LWP.
SunOS 5.11 Last change: 02 Mar 2007 7
CPU Performance Counters Library Functions cpcbindevent(3CPC)
Different processors have different counter ranges avail-
able, though all processors supported by Solaris allow at
least 31 bits to be specified as a counter preset value;
thus portable preset values lie in the range UINT64MAX to
UINT64MAX-INT32MAX.
The appropriate preset value will often need to be deter-
mined experimentally. Typically, it will depend on the event
being measured, as well as the desire to minimize the impact
of the act of measurement on the event being measured; less
frequent interrupts and samples lead to less perturbation of
the system.
If the processor cannot detect counter overflow, this call
will fail (ENOTSUP). Specifying a null event unbinds the
context from the underlying LWP and disables signal
delivery. Currently, only user events can be measured using
this technique. See Example 2, above.
Inheriting events onto multiple LWPs
By default, the library binds the performance counter con-
text to the current LWP only. If the CPCBINDLWPINHERIT
flag is set, then any subsequent LWPs created by that LWP
will automatically inherit the same performance counter con-
text. The counters will be initialized to 0 as if a
cpcbindevent() had just been issued. This automatic inher-
itance behavior can be useful when dealing with mul-
tithreaded programs to determine aggregate statistics for
the program as a whole.
If the CPCBINDEMTOVF flag is also set, the process will
immediately dispatch a SIGEMT signal to the freshly created
LWP so that it can preset its counters appropriately on the
new LWP. This initialization condition can be detected using
cpctakesample() to check that both cepic[] values are set
to UINT64MAX.
SunOS 5.11 Last change: 02 Mar 2007 8
|