Latest post
How Small Can a Measured Region Be Before perf Counters Lie?
Profiling code with hardware performance counters introduces overhead that can completely dwarf the actual measurement. We measured the cost on AMD Zen 4 and two Intel generations: rdpmc becomes reliable above ~12K instructions on every platform, while the kernel-mediated ioctl path needs ~100K on AMD and ~400K on Intel. Below those thresholds, you are mostly measuring the instrumentation.
All posts
How Small Can a Measured Region Be Before perf Counters Lie?
Profiling code with hardware performance counters introduces overhead that can completely dwarf the actual measurement. We measured the cost on AMD Zen 4 and two Intel generations: rdpmc becomes reliable above ~12K instructions on every platform, while the kernel-mediated ioctl path needs ~100K on AMD and ~400K on Intel. Below those thresholds, you are mostly measuring the instrumentation.
Profiling Specific Code Segments of Applications
Performance profiling plays a critical role in optimizing applications. Yet, popular tools like Linux Perf typically profile entire applications, making it challenging to focus on specific code segments. However, the perf subsystem in Linux provides direct access to hardware performance counters within applications. In this article, we illustrate how to control performance counters from C++ applications using a practical example.