Performance Metrics¶

Metrics combine multiple hardware events into a single derived value. Knowing that your code produced 1 million cache misses is less useful than knowing they amount to a 5% miss rate.

Tip

See the examples: metric.cpp (custom metrics) and rapl.cpp (RAPL power counters). For inspiration when creating custom metrics, explore the Likwid project.

Available Built-in Metrics¶

perf-cpp ships with metrics for the most common analysis questions. They require no setup; add them to your EventCounter by name.

Metric	What It Tells You	Formula
`gigahertz`	CPU frequency during measurement	`cycles / (seconds × 10⁹)`
`cycles-per-instruction`	How many cycles each instruction takes (lower is better)	`cycles / instructions`
`instructions-per-cycle`	How many instructions complete per cycle (higher is better)	`instructions / cycles`
`cache-hit-ratio`	Fraction of cache accesses served from cache	`(cache-references − cache-misses) / cache-references`
`cache-miss-ratio`	Fraction of cache accesses that missed	`cache-misses / cache-references`
`dTLB-miss-ratio`	How often data address translation misses	`dTLB-load-misses / dTLB-loads`
`iTLB-miss-ratio`	How often instruction address translation misses	`iTLB-load-misses / iTLB-loads`
`L1-data-miss-ratio`	L1 data cache miss rate	`L1-dcache-load-misses / L1-dcache-loads`
`branch-miss-ratio`	Branch prediction failure rate	`branch-misses / branches`
`watts-pkg`	CPU package power consumption in Watts (requires RAPL)	`energy-pkg / seconds`
`watts-cores`	CPU core power consumption in Watts (requires RAPL)	`energy-cores / seconds`
`watts-ram`	RAM power consumption in Watts (requires RAPL)	`energy-ram / seconds`

All *-ratio metrics return values between 0 and 1; a 5% miss rate is reported as 0.05.

Note

The watts-* metrics require RAPL (Running Average Power Limit) support, which is available on most modern Intel and AMD processors. Available RAPL domains vary by hardware: energy-pkg is widely supported, energy-cores and energy-ram depend on the processor model. Reading RAPL counters may require perf_event_paranoid <= 0 or CAP_SYS_ADMIN.

Working with Metrics¶

Metrics behave like regular events: add them by name, measure, and read the result.

#include <perfcpp/event_counter.hpp>

auto event_counter = perf::EventCounter{};

/// Add metrics just like regular events.
event_counter.add("cycles-per-instruction");

/// Measure your code.
event_counter.start();
/// ... your code being measured ...
event_counter.stop();

/// Get the calculated metric value; std::nullopt if the required events were not measured.
const auto result = event_counter.result();
const auto cpi = result.get("cycles-per-instruction");

/// Release resources explicitly, or let the destructor handle it.
event_counter.close();

perf-cpp configures the required hardware events automatically (e.g., cycles and instructions for CPI) if not already being measured.

Creating Custom Metrics¶

perf-cpp supports two approaches for defining custom metrics: formula-based and class-based.

Custom metrics are registered via the perf::CounterDefinition passed to the EventCounter (→ read more about adding custom events and metrics).

Formula-Based Metrics¶

For straightforward calculations, express your metric as a mathematical formula:

auto counter_definition = perf::CounterDefinition{};

/// Define a metric showing what percentage of stalls come from memory loads
counter_definition.add("stalls-by-mem-loads",
                       "(CYCLE_ACTIVITY_STALLS_LDM_PENDING / CYCLE_ACTIVITY_STALLS_TOTAL) * 100");

auto event_counter = perf::EventCounter{ counter_definition };
event_counter.add("stalls-by-mem-loads");

This example uses Intel SkylakeX events to identify memory bottlenecks, adapted from Likwid's cycle stalls metrics.

Every event referenced in a formula must be known to the CounterDefinition, either as a built-in event or added beforehand (→ adding custom events).

Supported Operations¶

Formulas can use: - Basic arithmetic: +, -, *, / - Constants, including scientific notation: 1E9, 2.5e-6 - Parentheses for grouping: (a + b) / c

Built-in Functions¶

In addition, the following functions are available:

Function	Purpose	Example
`ratio(a, b)`	Division that returns `0` if the denominator is `0`	`ratio('branch-misses', 'branches')`
`d_ratio(a, b)`	Alias for `ratio()`, named after Linux perf's `d_ratio`	`d_ratio('misses', 'attempts')`
`sum(a, b, ...)`	Add two or more values	`sum('l1_hits', 'l2_hits', 'l3_hits')`

You can nest functions for complex calculations:

/// Calculate miss ratio across all cache levels
counter_definition.add("total-cache-miss-ratio",
    "ratio("
    "  sum('mem_load_retired.l1_miss', 'mem_load_retired.l2_miss', 'mem_load_retired.l3_miss'),"
    "  sum('mem_load_retired.l1_hit', 'mem_load_retired.l2_hit', 'mem_load_retired.l3_hit')"
    ")"
);

Important

Event names containing operators (like the hyphen in L1-dcache-misses) must be wrapped in single quotes or backticks: 'L1-dcache-misses' or `L1-dcache-misses`. This prevents the parser from interpreting the hyphen as subtraction. Underscores and dots are regular identifier characters, so names like mem_load_retired.l1_miss work without quotes.

Class-Based Metrics¶

Formulas cannot express branching or architecture-specific logic. For such cases, implement the metric as a class derived from perf::Metric:

#include <perfcpp/metric/metric.hpp>

class StallsPerCacheMiss final : public perf::Metric
{
public:
    /// Define the metric's identifier.
    [[nodiscard]] std::string name() const override
    {
        return "stalls-per-cache-miss";
    }

    /// Declare which events this metric needs.
    [[nodiscard]] std::vector<std::string> required_counter_names() const override
    {
        return {"stalls", "cache-misses"};
    }

    /// Perform the calculation after measurement completes.
    [[nodiscard]] std::optional<double> calculate(const perf::CounterResult& result) const override
    {
        const auto stalls = result.get("stalls");
        const auto cache_misses = result.get("cache-misses");

        /// Both events must have been measured.
        if (stalls.has_value() && cache_misses.has_value())
        {
            /// Avoid division by zero.
            if (cache_misses.value() > 0)
            {
                return stalls.value() / cache_misses.value();
            }
        }

        /// Return empty if the calculation is not possible.
        return std::nullopt;
    }
};

Register your custom metric class with the counter definition:

auto counter_definition = perf::CounterDefinition{};

/// Register using the name reported by the metric itself.
counter_definition.add(std::make_unique<StallsPerCacheMiss>());

/// Or register under a custom name.
counter_definition.add("SPCM", std::make_unique<StallsPerCacheMiss>());

/// Use it like any other metric.
auto event_counter = perf::EventCounter{ counter_definition };
event_counter.add("stalls-per-cache-miss");  /// Or "SPCM" if registered under a custom name.