Skip to content

Recording Hardware Events in Parallel

Performance counters can be recorded per thread, per CPU core, or per process:

  1. Per-thread counters: Each thread gets its own counter, results are combined afterward.
  2. Inherited counters: A single counter automatically covers all child threads.
  3. Per-CPU-core counters: Monitor specific CPU cores regardless of which process runs on them.
  4. Per-process counters: Monitor specific processes by PID.

Per-Thread Counters

perf::MultiThreadEventCounter creates one counter per thread and combines the results:

#include <perfcpp/event_counter.hpp>

const auto count_threads = 4U;
auto multi_event_counter = perf::MultiThreadEventCounter{count_threads};
multi_event_counter.add({"instructions", "cycles", "cache-misses"});

/// Start/stop per thread.
auto threads = std::vector<std::thread>{};
for (auto thread_id = 0U; thread_id < count_threads; ++thread_id) {
    threads.emplace_back([thread_id, &multi_event_counter]() {
        multi_event_counter.start(thread_id);
        /// ... computation here ...
        multi_event_counter.stop(thread_id);
    });
}

for (auto& thread : threads) {
    thread.join();
}

/// Combined results across all threads.
const auto result = multi_event_counter.result();
for (const auto [name, value] : result)
{
    std::cout << name << " = " << value << std::endl;
}

/// Result for a specific thread.
const auto thread_result = multi_event_counter.result_of_thread(0U);

/// Release resources explicitly, or let the destructor handle it.
multi_event_counter.close();

Inherited Counters

A single perf::EventCounter with include_child_threads(true) automatically monitors all threads spawned after start():

#include <perfcpp/event_counter.hpp>

auto config = perf::Config{};
config.include_child_threads(true);

auto event_counter = perf::EventCounter{ config };
event_counter.add({"instructions", "cycles", "cache-misses"});

/// Start before spawning threads.
event_counter.start();

auto threads = std::vector<std::thread>{};
for (auto i = 0U; i < count_threads; ++i) {
    threads.emplace_back([]() {
        /// ... computation here ...
    });
}

for (auto& thread : threads) {
    thread.join();
}

event_counter.stop();

/// Aggregated result across all threads.
const auto result = event_counter.result();
for (const auto [name, value] : result)
{
    std::cout << name << " = " << value << std::endl;
}

/// Release resources explicitly, or let the destructor handle it.
event_counter.close();

Per-CPU-Core Counters

perf::MultiCoreEventCounter records events on specified CPU cores, capturing activity from all processes running there.

Note

This requires perf_event_paranoid < 1. See the perf paranoid setting.

#include <perfcpp/event_counter.hpp>

const auto cpu_core_ids = std::vector<std::uint16_t>{0U, 4U, 8U, 12U};
auto multi_cpu_counter = perf::MultiCoreEventCounter{ cpu_core_ids };
multi_cpu_counter.add({"instructions", "cycles", "cache-misses"});

multi_cpu_counter.start();
/// ... computation runs on the monitored cores ...
multi_cpu_counter.stop();

/// Combined results across all monitored cores.
const auto result = multi_cpu_counter.result();
for (const auto [name, value] : result)
{
    std::cout << name << " = " << value << std::endl;
}

/// Result for a specific core (returns std::optional).
const auto core_result = multi_cpu_counter.result_of_core(12U);

/// Release resources explicitly, or let the destructor handle it.
multi_cpu_counter.close();

Per-Process Counters

perf::MultiProcessEventCounter records events for specific processes by PID, combining the results:

Note

Monitoring other processes may require elevated privileges. See the perf paranoid setting.

#include <perfcpp/event_counter.hpp>

const auto process_ids = std::vector<pid_t>{1234, 5678};
auto multi_process_counter = perf::MultiProcessEventCounter{ process_ids };
multi_process_counter.add({"instructions", "cycles", "cache-misses"});

multi_process_counter.start();
/// ... processes are running ...
multi_process_counter.stop();

/// Combined results across all monitored processes.
const auto result = multi_process_counter.result();
for (const auto [name, value] : result)
{
    std::cout << name << " = " << value << std::endl;
}

/// Result for a specific process (returns std::optional).
const auto process_result = multi_process_counter.result_of_process(1234);

/// Release resources explicitly, or let the destructor handle it.
multi_process_counter.close();