Skip to content

Accessing Live Event Counts

Read hardware counter values without stopping the counters, using the rdpmc instruction on x86 systems. This is useful for measuring individual iterations or phases within a running computation.

For standard (non-live) recording, see recording basics.

Tip

See live_events.cpp for a full working example.


Basic Lifecycle

Add events with add_live() instead of add(), then use start() / stop() as usual:

#include <perfcpp/event_counter.hpp>

auto event_counter = perf::EventCounter{};
event_counter.add_live({"cache-misses", "cache-references", "branches"});

event_counter.start();

/// ... read live values during computation (see below) ...

event_counter.stop();

/// Release resources explicitly, or let the destructor handle it.
event_counter.close();

Important

Avoid mixing live events with regular events; using only live events leads to more consistent results.

Note

add_live() accepts hardware events only; adding a metric or time event throws an exception.

Reading Live Values

There are two ways to read counter values during computation.

The LiveEventCounter wrapper reads all live counters on start() and stop() and calculates the differences, without allocating memory during reads:

auto live_event_counter = perf::LiveEventCounter{ event_counter };

event_counter.start();

for (auto i = 0U; i < runs; ++i) {
    live_event_counter.start();
    /// ... computation here ...
    live_event_counter.stop();

    std::cout
        << "cache-misses: " << live_event_counter.get("cache-misses")
        << ", cache-references: " << live_event_counter.get("cache-references")
        << std::endl;
}

event_counter.stop();

get() returns the difference between the stop and start values, or 0 if the event name is unknown. An optional second argument normalizes the result: get("cache-misses", num_iterations).

Direct access via EventCounter

For maximum performance, pre-allocate result vectors (one entry per event) and compute differences yourself:

/// One entry per event: cache-misses, cache-references, branches.
auto start_values = std::vector<double>{.0, .0, .0};
auto end_values = std::vector<double>{.0, .0, .0};

event_counter.start();

for (auto i = 0U; i < runs; ++i) {
    event_counter.live_result(start_values);
    /// ... computation here ...
    event_counter.live_result(end_values);

    std::cout
        << "cache-misses: " << end_values[0] - start_values[0]
        << ", cache-references: " << end_values[1] - start_values[1]
        << std::endl;
}

event_counter.stop();

Values are returned in the order events were added. The vector size must match the number of live events exactly; otherwise live_result() throws. This approach avoids per-read allocations but requires you to track the index-to-event mapping.