Skip to content

Pure Rust port of rsched#10

Open
rrnewton wants to merge 8 commits intomasoncl:mainfrom
rrnewton:pure-rust2
Open

Pure Rust port of rsched#10
rrnewton wants to merge 8 commits intomasoncl:mainfrom
rrnewton:pure-rust2

Conversation

@rrnewton
Copy link
Copy Markdown

@rrnewton rrnewton commented Feb 25, 2026

Hello Chris,

This was just a little experiment to:
(1) check in on the status of aya-rs for pure-Rust BPF. I think it's coming along!
(2) play with the safe/unsafe boundary in Rust-bpf-code. But this still isn't right yet for Maps.

LMK what you think.

P.S. By the way, part of the long-term motivation here is that if we eventually get to having SCX schedulers in pure Rust I strongly believe that our ability to factor components into clean libraries shared between schedulers will greatly improve (traits, polymorphism, etc).

Replace libbpf-rs/libbpf-cargo + BPF C code with the Aya pure-Rust eBPF
toolchain. The BPF program is now written entirely in Rust targeting
bpfel-unknown-none, compiled via aya-build in build.rs.

Key changes:
- rsched-common: shared #[repr(C)] types (Hist, HistData, TimesliceData,
  etc.) with Default/Copy/Clone impls and aya::Pod impls behind "user"
  feature gate
- rsched-ebpf: pure Rust eBPF program with all 6 tracepoint handlers
  (sched_wakeup, sched_wakeup_new, sched_waking, sched_switch,
  sched_migrate_task, sched_process_exit) in both tp_btf and raw_tp
  variants. Uses vmlinux struct stubs with pahole-verified offsets.
- Userspace: aya::EbpfLoader replaces libbpf-rs skeleton, with
  set_global() for rodata patching and typed HashMap map access.
  BTF tracepoints tried first with raw_tp fallback.

Removes dependencies: libbpf-rs, libbpf-cargo, plain
Adds dependencies: aya, aya-build, rsched-common
These files are no longer used after the Aya port:
- src/bpf/rsched.bpf.c (replaced by rsched-ebpf/src/main.rs)
- src/bpf/core-helpers.h (replaced by rsched-ebpf/src/vmlinux.rs)
- src/bpf/vmlinux.h (kernel types now via minimal stubs)
- src/bpf/.clang-format
The original C BPF program collected nr_running via the runqueues ksym
and bpf_per_cpu_ptr. Since aya-ebpf lacks ksym support, use an
alternative path: read nr_running through task->se.cfs_rq->nr_running
(offsets 192+160 for cfs_rq pointer, then offset 16 for nr_running).

Also adds thread_info and rq struct stubs to vmlinux.rs and a
ZERO_NR_RUNNING scratch map for initializing map entries.
Two bugs fixed:

1. Perf event array MapData objects were dropped after setup, closing
   their kernel FDs. While the BPF program still holds references to
   the maps, the perf event FDs attached via BPF_MAP_UPDATE_ELEM could
   become stale. Fix: keep all perf MapData alive in _perf_map_data vec
   for the program's lifetime. This also fixes IPC counters showing
   zeros after the first collection interval.

2. The userspace record_generic_tick() was computing deltas between
   ticks, but the BPF program already accumulates deltas in the
   per-PID counter values between map drains. The double-delta
   computation caused all values to appear as zero. Fix: treat
   collected counter values as already-computed deltas and record
   them directly into histograms.

Also removes the now-unused last_generic_counters field from CpuMetrics.
Create bpf_helpers.rs with safe wrappers for BPF helper functions
(probe_read, ktime_ns, current_cpu, read_perf_counter, volatile
read/write helpers). Add safe accessor methods to task_struct in
vmlinux.rs (pid, state, cpu, wake_cpu, read_comm, cgroup_id,
cfs_rq_nr_running). Remove 12 standalone unsafe helper functions
from main.rs and update all call sites. Reduces unsafe occurrences
in main.rs from 46 to 34.
These functions take raw pointers whose validity cannot be verified
at compile time — marking them safe was unsound. The task_struct
accessor methods remain the safe public boundary: &self guarantees
a valid kernel address for field offset computation, and
bpf_probe_read_kernel handles faults gracefully.
Create map_ops.rs with safe wrappers for BPF map lookups: get(),
get_mut(), contains_key(), pca_get(), pca_get_mut(). Raw map
pointers are converted to Rust references at the module boundary,
with a documented safety model based on BPF's pre-allocated entries
and non-preemptible per-CPU execution.

Move add_to_u32/add_to_u64 from bpf_helpers to map_ops, changing
signatures from *mut to &mut — genuinely safe since the reference
guarantees validity.

All do_* handler functions are now safe fn. The only remaining
unsafe in main.rs is:
- 8 #[unsafe(...)] attributes on rodata statics
- 6 one-line unsafe { &*t } blocks to convert tracepoint task
  pointers to references
- 15 unsafe { ctx.arg() / rarg_ptr() } blocks in tracepoint
  entry points (inherent to BPF context extraction)
- 2 unsafe fn (rarg_ptr, rarg_i32) for raw tracepoint arg parsing
- 1 panic handler

Reduces main.rs unsafe from 34 to 29 occurrences, but more
importantly: zero unsafe fn in the handler logic — all 11 do_*
functions are now safe fn.
@rrnewton rrnewton changed the title Pure rust port of rsched Pure Rust port of rsched Feb 25, 2026
Replace &V/&mut V references to hash map memory with MapPtr<V>, a
volatile pointer wrapper that never creates Rust references. This
prevents LLVM from assuming "dereferenceable" semantics on map memory
that may be concurrently modified by other CPUs.

- MapPtr<V> wraps *mut V with read()/write() volatile access
- field_ptr! macro projects to struct fields via addr_of_mut!
- add_to_u32/add_to_u64 now take *mut (unsafe fn, honestly marked)
- PerCpuArray keeps returning &V/&mut V (genuinely safe, single CPU)
- task_struct gains read_comm_ptr() for volatile-compatible comm writes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant