Performance Metrics¶
Metrics combine multiple hardware events into a single derived value. Knowing that your code produced 1 million cache misses is less useful than knowing they amount to a 5% miss rate.
Tip
See the examples: metric.cpp (custom metrics) and rapl.cpp (RAPL power counters). For inspiration when creating custom metrics, explore the Likwid project.
Available Built-in Metrics¶
perf-cpp ships with metrics for the most common analysis questions.
They require no setup; add them to your EventCounter by name.
| Metric | What It Tells You | Formula |
|---|---|---|
gigahertz |
CPU frequency during measurement | cycles / (seconds × 10⁹) |
cycles-per-instruction |
How many cycles each instruction takes (lower is better) | cycles / instructions |
instructions-per-cycle |
How many instructions complete per cycle (higher is better) | instructions / cycles |
cache-hit-ratio |
Fraction of cache accesses served from cache | (cache-references − cache-misses) / cache-references |
cache-miss-ratio |
Fraction of cache accesses that missed | cache-misses / cache-references |
dTLB-miss-ratio |
How often data address translation misses | dTLB-load-misses / dTLB-loads |
iTLB-miss-ratio |
How often instruction address translation misses | iTLB-load-misses / iTLB-loads |
L1-data-miss-ratio |
L1 data cache miss rate | L1-dcache-load-misses / L1-dcache-loads |
branch-miss-ratio |
Branch prediction failure rate | branch-misses / branches |
watts-pkg |
CPU package power consumption in Watts (requires RAPL) | energy-pkg / seconds |
watts-cores |
CPU core power consumption in Watts (requires RAPL) | energy-cores / seconds |
watts-ram |
RAM power consumption in Watts (requires RAPL) | energy-ram / seconds |
All *-ratio metrics return values between 0 and 1; a 5% miss rate is reported as 0.05.
Note
The watts-* metrics require RAPL (Running Average Power Limit) support, which is available on most modern Intel and AMD processors.
Available RAPL domains vary by hardware: energy-pkg is widely supported, energy-cores and energy-ram depend on the processor model.
Reading RAPL counters may require perf_event_paranoid <= 0 or CAP_SYS_ADMIN.
Working with Metrics¶
Metrics behave like regular events: add them by name, measure, and read the result.
#include <perfcpp/event_counter.hpp>
auto event_counter = perf::EventCounter{};
/// Add metrics just like regular events.
event_counter.add("cycles-per-instruction");
/// Measure your code.
event_counter.start();
/// ... your code being measured ...
event_counter.stop();
/// Get the calculated metric value; std::nullopt if the required events were not measured.
const auto result = event_counter.result();
const auto cpi = result.get("cycles-per-instruction");
/// Release resources explicitly, or let the destructor handle it.
event_counter.close();
perf-cpp configures the required hardware events automatically (e.g., cycles and instructions for CPI) if not already being measured.
Creating Custom Metrics¶
perf-cpp supports two approaches for defining custom metrics: formula-based and class-based.
Custom metrics are registered via the perf::CounterDefinition passed to the EventCounter (→ read more about adding custom events and metrics).
Formula-Based Metrics¶
For straightforward calculations, express your metric as a mathematical formula:
auto counter_definition = perf::CounterDefinition{};
/// Define a metric showing what percentage of stalls come from memory loads
counter_definition.add("stalls-by-mem-loads",
"(CYCLE_ACTIVITY_STALLS_LDM_PENDING / CYCLE_ACTIVITY_STALLS_TOTAL) * 100");
auto event_counter = perf::EventCounter{ counter_definition };
event_counter.add("stalls-by-mem-loads");
This example uses Intel SkylakeX events to identify memory bottlenecks, adapted from Likwid's cycle stalls metrics.
Every event referenced in a formula must be known to the CounterDefinition, either as a built-in event or added beforehand (→ adding custom events).
Supported Operations¶
Formulas can use:
- Basic arithmetic: +, -, *, /
- Constants, including scientific notation: 1E9, 2.5e-6
- Parentheses for grouping: (a + b) / c
Built-in Functions¶
In addition, the following functions are available:
| Function | Purpose | Example |
|---|---|---|
ratio(a, b) |
Division that returns 0 if the denominator is 0 |
ratio('branch-misses', 'branches') |
d_ratio(a, b) |
Alias for ratio(), named after Linux perf's d_ratio |
d_ratio('misses', 'attempts') |
sum(a, b, ...) |
Add two or more values | sum('l1_hits', 'l2_hits', 'l3_hits') |
You can nest functions for complex calculations:
/// Calculate miss ratio across all cache levels
counter_definition.add("total-cache-miss-ratio",
"ratio("
" sum('mem_load_retired.l1_miss', 'mem_load_retired.l2_miss', 'mem_load_retired.l3_miss'),"
" sum('mem_load_retired.l1_hit', 'mem_load_retired.l2_hit', 'mem_load_retired.l3_hit')"
")"
);
Important
Event names containing operators (like the hyphen in L1-dcache-misses) must be wrapped in single quotes or backticks: 'L1-dcache-misses' or `L1-dcache-misses`.
This prevents the parser from interpreting the hyphen as subtraction.
Underscores and dots are regular identifier characters, so names like mem_load_retired.l1_miss work without quotes.
Class-Based Metrics¶
Formulas cannot express branching or architecture-specific logic. For such cases, implement the metric as a class derived from perf::Metric:
#include <perfcpp/metric/metric.hpp>
class StallsPerCacheMiss final : public perf::Metric
{
public:
/// Define the metric's identifier.
[[nodiscard]] std::string name() const override
{
return "stalls-per-cache-miss";
}
/// Declare which events this metric needs.
[[nodiscard]] std::vector<std::string> required_counter_names() const override
{
return {"stalls", "cache-misses"};
}
/// Perform the calculation after measurement completes.
[[nodiscard]] std::optional<double> calculate(const perf::CounterResult& result) const override
{
const auto stalls = result.get("stalls");
const auto cache_misses = result.get("cache-misses");
/// Both events must have been measured.
if (stalls.has_value() && cache_misses.has_value())
{
/// Avoid division by zero.
if (cache_misses.value() > 0)
{
return stalls.value() / cache_misses.value();
}
}
/// Return empty if the calculation is not possible.
return std::nullopt;
}
};
Register your custom metric class with the counter definition:
auto counter_definition = perf::CounterDefinition{};
/// Register using the name reported by the metric itself.
counter_definition.add(std::make_unique<StallsPerCacheMiss>());
/// Or register under a custom name.
counter_definition.add("SPCM", std::make_unique<StallsPerCacheMiss>());
/// Use it like any other metric.
auto event_counter = perf::EventCounter{ counter_definition };
event_counter.add("stalls-per-cache-miss"); /// Or "SPCM" if registered under a custom name.