This project demonstrates various Rust memory layout optimization techniques and benchmarks their performance using Criterion.
src/lib.rs- Implements all optimization techniquessrc/main.rs- Demonstrates the optimizations in actionbenches/allocation_bench.rs- Benchmarks comparing allocation strategies
- Direct Allocation: Allocate memory using
Box::new() - Memory Pool Allocation: Use a custom
ObjectPoolto reuse memory and reduce allocation/deallocation overhead
- AoS (Array of Structures): Traditional struct array layout
- SoA (Structure of Arrays): Separate struct fields into individual arrays for better cache efficiency
- Cache-Friendly Access: Sequential memory access leveraging spatial locality
- Cache-Unfriendly Access: Random memory access that disrupts cache locality
- Dynamic Growth: Let
Vecgrow automatically as elements are pushed - Preallocated Capacity: Use
Vec::with_capacity()to allocate memory ahead of time
cargo runcargo benchAfter benchmarks complete, open target/criterion/report/index.html in your browser to view the detailed performance report.
Object pooling reduces dynamic allocations by reusing preallocated memory blocks:
- Reduces system calls
- Minimizes memory fragmentation
- Speeds up allocation/deallocation
AoS (Array of Structures):
struct DataNode {
id: u64,
value: f64,
data: [u8; 512],
}
let nodes: Vec<DataNode> = vec![...];SoA (Structure of Arrays):
struct SoADataStore {
ids: Vec<u64>,
values: Vec<f64>,
data: Vec<[u8; 512]>,
}SoA improves cache locality when accessing specific fields.
SoA在需要访问特定字段时更加缓存友好,因为相关数据在内存中连续存放。
- 空间局部性: 访问相邻内存地址的数据
- 时间局部性: 短时间内重复访问同一数据
顺序访问模式能更好地利用CPU缓存,显著提高性能。
预分配避免了频繁的内存重新分配:
- 减少
realloc调用 - 避免数据复制
- 提供更可预测的性能
Based on benchmark results, you can expect:
- 10-30% speedup with SoA compared to AoS
- 20-50% speedup with memory pool vs direct allocation
- 2-5× speedup with cache-friendly vs cache-unfriendly access
- 15-40% speedup with preallocated capacity
具体数值取决于硬件配置和数据规模。
- Choose data layout based on access patterns (AoS vs SoA)
- Use memory pooling for frequently allocated/freed objects
- Preallocate memory when you know the element count
- Optimize access patterns (favor sequential over random)
- Measure performance using benchmarks (e.g., Criterion)