Latest post

How Small Can a Measured Region Be Before perf Counters Lie?

Profiling code with hardware performance counters introduces overhead that can completely dwarf the actual measurement. We measured the cost on AMD Zen 4 and two Intel generations: rdpmc becomes reliable above ~12K instructions on every platform, while the kernel-mediated ioctl path needs ~100K on AMD and ~400K on Intel. Below those thresholds, you are mostly measuring the instrumentation.

All posts