Skip to content

zero copy perf event#1452

Draft
bjmask wants to merge 1 commit intoaya-rs:mainfrom
bjmask:feature/zero-copy-perf-buffer
Draft

zero copy perf event#1452
bjmask wants to merge 1 commit intoaya-rs:mainfrom
bjmask:feature/zero-copy-perf-buffer

Conversation

@bjmask
Copy link

@bjmask bjmask commented Jan 21, 2026

This change is Reviewable

Copilot AI review requested due to automatic review settings January 21, 2026 16:57
@bjmask bjmask marked this pull request as draft January 21, 2026 16:57
@netlify
Copy link

netlify bot commented Jan 21, 2026

Deploy Preview for aya-rs-docs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 41aaf58
🔍 Latest deploy log https://app.netlify.com/projects/aya-rs-docs/deploys/6971056d5853170008eedfd0
😎 Deploy Preview https://deploy-preview-1452--aya-rs-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a zero-copy API for reading perf event samples to avoid copying sample data from the ring buffer, improving performance for high-throughput scenarios. It introduces new types (RawEvents, RawSample, RawSampleData) that provide direct access to the mmap'd buffer, and adds a benchmark feature with helpers for microbenchmarking.

Changes:

  • Adds read_events_raw() method that returns an iterator yielding zero-copy samples
  • Introduces benchmark infrastructure gated behind a "bench" feature flag
  • Extends test/mock syscall infrastructure to support benchmarks

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
aya/src/maps/perf/perf_buffer.rs Core zero-copy implementation with RawEvents iterator, RawSample/RawSampleData types, bench module with BenchBuffer helper, and tests
aya/src/maps/perf/perf_event_array.rs Public API wrapper exposing read_events_raw() on PerfEventArrayBuffer
aya/src/sys/mod.rs Extends test-only syscall mocking to also work with bench feature
aya/src/sys/fake.rs Extends fake syscall infrastructure to support bench feature
aya/src/lib.rs Extends MockableFd test infrastructure to support bench feature
aya/benches/perf_buffer.rs New benchmark comparing copy vs zero-copy performance
aya/Cargo.toml Adds criterion dev-dependency and bench feature configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pub(crate) fn read_events_raw(&mut self) -> Result<RawEvents<'_>, PerfBufferError> {
let header = self.buf().as_ptr();
let base = unsafe { header.byte_add(self.page_size) };
let head = unsafe { (*header).data_head } as usize;
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing memory fence after reading data_head in read_events_raw. The existing read_events method at line 396 reads data_head, and the code at line 431 uses a SeqCst fence before writing data_tail. The newly added readable method at line 318 correctly uses an Acquire fence after reading data_head. The read_events_raw method should also include an acquire fence after reading data_head at line 440 to ensure proper memory ordering with the kernel writer, similar to how it's done in readable.

Suggested change
let head = unsafe { (*header).data_head } as usize;
let head = unsafe { (*header).data_head } as usize;
atomic::fence(Ordering::Acquire);

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +114
/// Zero-copy view into a sample payload.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct RawSampleData<'a> {
first: &'a [u8],
second: &'a [u8],
}

impl<'a> RawSampleData<'a> {
/// Returns the two slices that make up the sample data.
pub fn as_slices(&self) -> (&'a [u8], &'a [u8]) {
(self.first, self.second)
}

/// Returns the total length of the sample data.
pub fn len(&self) -> usize {
self.first.len() + self.second.len()
}

/// Returns true if the sample data is contiguous.
pub fn is_contiguous(&self) -> bool {
self.second.is_empty()
}
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_empty method is missing from RawSampleData. The type provides len() and is_contiguous(), but lacks the standard is_empty() method that would be expected on a collection-like type. This would be more consistent with Rust conventions.

Copilot uses AI. Check for mistakes.
Comment on lines +235 to +242
x if x == PERF_RECORD_LOST as u32 => {
if event_size < header_size + mem::size_of::<u64>() * 2 {
self.tail = self.head;
return None;
}
let count = self.read_u64(event_start + header_size + mem::size_of::<u64>());
self.events.lost += count as usize;
self.tail += event_size;
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for PERF_RECORD_LOST events in the raw API. While the implementation handles lost events at lines 235-242, and there's a test for lost events using the regular read_events API (test_read_first_lost at line 675), there's no corresponding test that exercises the lost event handling path for read_events_raw.

Copilot uses AI. Check for mistakes.
use super::{Events, PerfBuffer, PerfBufferError, RawEvents};
use crate::sys::{Syscall, TEST_MMAP_RET, override_syscall};

/// Test-only perf buffer backed by a fake mmap region.
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module and type are named "bench" and "BenchBuffer" but the documentation says "Test-only perf buffer". The naming suggests this is for benchmarks (not tests), while the documentation suggests it's for tests. Either the name should be TestBuffer or the documentation should say "Benchmark-only perf buffer".

Suggested change
/// Test-only perf buffer backed by a fake mmap region.
/// Benchmark-only perf buffer backed by a fake mmap region.

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +103
/// Returns the two slices that make up the sample data.
pub fn as_slices(&self) -> (&'a [u8], &'a [u8]) {
(self.first, self.second)
}
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for RawSampleData and as_slices() doesn't explain why the sample data is split into two slices. This is due to the ring buffer wrapping, and documenting this would help users understand when the second slice would be non-empty. Consider adding documentation like "Returns two slices making up the sample data. When the sample wraps around the ring buffer, the first slice contains the data before wrapping and the second contains the data after wrapping. Use is_contiguous() to check if the data is in a single slice."

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants