Expand description
Support for the Performance Monitoring Unit
We have support for PMU version 2. Each succesive PMU version includes the features provided by the previous versions.
Version 1 Support: To configure an architectural performance monitoring event, we program the performance event select registers (IA32_PERFEVTSELx MSRs). The result of the performance monitoring event is reported in a general purpose Performance Monitoring Counter (PMC) (IA32_PMCx MSR). There is one PMC for each performance event select register, and one PMU per logical core.
Version 2 Support: Three of the architectural events are counted using fixed function MSRs (IA32_FIXED_CTR0 through IA32_FIXED_CTR2), each with an associated control register. Three more MSRS are provided to simplify event programming. They are:
- IA32_PERF_GLOBAL_CTRL: allows software to enable/disable event counting of any combination of fixed-function PMCs or any general-purpose PMCs via a single WRMSR.
- IA32_PERF_GLOBAL_STATUS: allows software to query counter overflow conditions on any combination of fixed-function PMCs or general-purpose PMCs via a single RDMSR.
- IA32_PERF_GLOBAL_OVF_CTRL: allows software to clear counter overflow conditions on any combination of fixed-function PMCs or general-purpose PMCs via a single WRMSR.
We support 2 ways to use the PMU. One is to measure the number of events that take place over a length of code. The second is Event Based Sampling, where after a specified number of events occur, an interrupt is called and we store the instruction pointer and task id running at that point.
Currently we support a maximum core ID of 255, and up to 8 general purpose counters per core. A core ID greater than 255 is not supported in Theseus in general since the ID has to fit within a u8.
If the core ID limit is changed and we need to update the PMU data structures to support more cores then:
- Increase WORDS_IN_BITMAP and CORES_SUPPORTED_BY_PMU as required. For example, the cores supported is 256 so there are 4 64-bit words in the bitmap, one bit per core.
- Add additional AtomicU64 variables to the initialization of the CORES_SAMPLING and RESULTS_READY bitmaps.
If the general purpose PMC limit is reached then:
- Update PMCS_SUPPORTED_BY_PMU to the new PMC limit.
- Change the element type in the PMCS_AVAILABLE vector to be larger than AtomicU8 so that there is one bit per counter.
- Update INIT_PMCS_AVAILABLE to the new maximum value for the per core bitmap.
Monitoring without interrupts is almost free (around 0.3% performance penalty) - source: “These are Not Your Grand Daddy’s CPU Performance Counters” Blackhat USA, 2015
Example
pmu_x86::init();
let counter_freq = 0xFFFFF;
let num_samples = 500;
let sampler = pmu_x86::start_samples(pmu_x86::EventType::UnhaltedReferenceCycles, counter_freq, None, num_samples);
if let Ok(my_sampler) = sampler {
// wait some time here
if let Ok(mut samples) = pmu_x86::retrieve_samples() {
pmu_x86::print_samples(&mut samples);
}
}
Note
Currently, the PMU-based sampler will only capture samples on the same core as it was initialized and started from.
So, if you run pmu_x86::init()
and pmu_x86::start_samples()
on CPU core 2, it will only sample events on core 2.
Modules
- This module implements the equivalent of “perf stat”. Currently only 7 events are recorded.
Structs
- A logical counter object to correspond to a physical PMC
- Stores the instruction pointers and corresponding task IDs from the samples
Enums
- Used to select the event type to count. Event types are described in the Intel SDM 18.2.1 for PMU Version 1. The discriminant value for each event type is the value written to the event select register for a general purpose PMC.
Functions
- Finds the corresponding function for each instruction pointer and calculates the percentage amount each function occured in the samples
- This function is designed to be invoked from an interrupt handler when a sampling interrupt has (or may have) occurred.
- Initialization function that enables the PMU if one is available. We initialize the 3 fixed PMCs and general purpose PMCs. Calling this initialization function again on a core that has already been initialized will do nothing.
- Simple function to print values from SampleResults in a form that the script “post-mortem pmu analysis.py” can parse.
- Frees all counters and make them available to be used. Essentially sets the PMU to its initial state.
- Returns the samples that were stored during sampling in the form of a SampleResults object. If samples are not yet finished, forces them to stop.
- Start interrupt process in order to take samples using the PMU. It loads the starting value as such that an overflow will occur at “event_per_sample” events. That overflow triggers an interrupt where information about the current running task is sampled.