Open
Conversation
Replace libbpf-rs/libbpf-cargo + BPF C code with the Aya pure-Rust eBPF toolchain. The BPF program is now written entirely in Rust targeting bpfel-unknown-none, compiled via aya-build in build.rs. Key changes: - rsched-common: shared #[repr(C)] types (Hist, HistData, TimesliceData, etc.) with Default/Copy/Clone impls and aya::Pod impls behind "user" feature gate - rsched-ebpf: pure Rust eBPF program with all 6 tracepoint handlers (sched_wakeup, sched_wakeup_new, sched_waking, sched_switch, sched_migrate_task, sched_process_exit) in both tp_btf and raw_tp variants. Uses vmlinux struct stubs with pahole-verified offsets. - Userspace: aya::EbpfLoader replaces libbpf-rs skeleton, with set_global() for rodata patching and typed HashMap map access. BTF tracepoints tried first with raw_tp fallback. Removes dependencies: libbpf-rs, libbpf-cargo, plain Adds dependencies: aya, aya-build, rsched-common
These files are no longer used after the Aya port: - src/bpf/rsched.bpf.c (replaced by rsched-ebpf/src/main.rs) - src/bpf/core-helpers.h (replaced by rsched-ebpf/src/vmlinux.rs) - src/bpf/vmlinux.h (kernel types now via minimal stubs) - src/bpf/.clang-format
The original C BPF program collected nr_running via the runqueues ksym and bpf_per_cpu_ptr. Since aya-ebpf lacks ksym support, use an alternative path: read nr_running through task->se.cfs_rq->nr_running (offsets 192+160 for cfs_rq pointer, then offset 16 for nr_running). Also adds thread_info and rq struct stubs to vmlinux.rs and a ZERO_NR_RUNNING scratch map for initializing map entries.
Two bugs fixed: 1. Perf event array MapData objects were dropped after setup, closing their kernel FDs. While the BPF program still holds references to the maps, the perf event FDs attached via BPF_MAP_UPDATE_ELEM could become stale. Fix: keep all perf MapData alive in _perf_map_data vec for the program's lifetime. This also fixes IPC counters showing zeros after the first collection interval. 2. The userspace record_generic_tick() was computing deltas between ticks, but the BPF program already accumulates deltas in the per-PID counter values between map drains. The double-delta computation caused all values to appear as zero. Fix: treat collected counter values as already-computed deltas and record them directly into histograms. Also removes the now-unused last_generic_counters field from CpuMetrics.
Create bpf_helpers.rs with safe wrappers for BPF helper functions (probe_read, ktime_ns, current_cpu, read_perf_counter, volatile read/write helpers). Add safe accessor methods to task_struct in vmlinux.rs (pid, state, cpu, wake_cpu, read_comm, cgroup_id, cfs_rq_nr_running). Remove 12 standalone unsafe helper functions from main.rs and update all call sites. Reduces unsafe occurrences in main.rs from 46 to 34.
These functions take raw pointers whose validity cannot be verified at compile time — marking them safe was unsound. The task_struct accessor methods remain the safe public boundary: &self guarantees a valid kernel address for field offset computation, and bpf_probe_read_kernel handles faults gracefully.
Create map_ops.rs with safe wrappers for BPF map lookups: get(),
get_mut(), contains_key(), pca_get(), pca_get_mut(). Raw map
pointers are converted to Rust references at the module boundary,
with a documented safety model based on BPF's pre-allocated entries
and non-preemptible per-CPU execution.
Move add_to_u32/add_to_u64 from bpf_helpers to map_ops, changing
signatures from *mut to &mut — genuinely safe since the reference
guarantees validity.
All do_* handler functions are now safe fn. The only remaining
unsafe in main.rs is:
- 8 #[unsafe(...)] attributes on rodata statics
- 6 one-line unsafe { &*t } blocks to convert tracepoint task
pointers to references
- 15 unsafe { ctx.arg() / rarg_ptr() } blocks in tracepoint
entry points (inherent to BPF context extraction)
- 2 unsafe fn (rarg_ptr, rarg_i32) for raw tracepoint arg parsing
- 1 panic handler
Reduces main.rs unsafe from 34 to 29 occurrences, but more
importantly: zero unsafe fn in the handler logic — all 11 do_*
functions are now safe fn.
Replace &V/&mut V references to hash map memory with MapPtr<V>, a volatile pointer wrapper that never creates Rust references. This prevents LLVM from assuming "dereferenceable" semantics on map memory that may be concurrently modified by other CPUs. - MapPtr<V> wraps *mut V with read()/write() volatile access - field_ptr! macro projects to struct fields via addr_of_mut! - add_to_u32/add_to_u64 now take *mut (unsafe fn, honestly marked) - PerCpuArray keeps returning &V/&mut V (genuinely safe, single CPU) - task_struct gains read_comm_ptr() for volatile-compatible comm writes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello Chris,
This was just a little experiment to:
(1) check in on the status of aya-rs for pure-Rust BPF. I think it's coming along!
(2) play with the safe/unsafe boundary in Rust-bpf-code. But this still isn't right yet for Maps.
LMK what you think.
P.S. By the way, part of the long-term motivation here is that if we eventually get to having SCX schedulers in pure Rust I strongly believe that our ability to factor components into clean libraries shared between schedulers will greatly improve (traits, polymorphism, etc).