-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
area/pipelinePipeline processingPipeline processingpriority/mediumMedium priorityMedium prioritysize/LLarge: 1-2 weeksLarge: 1-2 weekstype/featureNew feature or functionalityNew feature or functionality
Description
Summary
Add NVIDIA NVENC hardware video encoding support for faster H.264 encoding. Includes ring buffer for smoothing I/O jitter.
Priority: MEDIUM ⏳
Performance Optimization - Can be done in parallel with deployment work.
Dependencies
- ✅ Pipeline Integration ([Phase 1] Integrate LerobotWriter with Worker.process_job() #72, [Phase 1] Add checkpoint save during pipeline processing #73) - COMPLETE
- Optional enhancement (fallback to software encoding)
Design
Encoder Abstraction
trait VideoEncoder {
fn encode_frame(&mut self, frame: &[u8]) -> Option<EncodedChunk>;
fn flush(&mut self) -> Vec<EncodedChunk>;
}Backend Priority
- h264_nvenc (NVIDIA GPU)
- h264_qsv (Intel QuickSync)
- h264_videotoolbox (macOS)
- libx264 (software fallback)
Ring Buffer
- GPU memory ring buffer
- Producer: Pipeline pushes frames
- Consumer: Encoder pulls frames
- Smooths network I/O jitter
Tasks
1. Define Encoder Abstraction
- Create
src/pipeline/gpu/encoder.rs - Define
VideoEncodertrait - Define
EncoderConfig:- codec, preset, crf, gop_size
- pixel_format, resolution
2. Implement NVENC Backend
- Create
src/pipeline/gpu/nvenc.rs - Options:
- Direct NVENC API (complex, best performance)
- ffmpeg subprocess with -c:v h264_nvenc (simpler)
- Initial: Use ffmpeg subprocess
- Future: Direct API for lower latency
3. Implement Ring Buffer
- Create
src/pipeline/gpu/ring_buffer.rs - Fixed-size circular buffer
- CUDA pinned memory (optional)
- Block producer when full
- Block consumer when empty
4. Implement Auto-Detection
detect_available_encoders() -> Vec<EncoderType>- Check for:
- nvidia-smi presence
- ffmpeg encoder availability
- CUDA runtime
- Select best available
5. Implement Fallback Chain
- Try encoders in priority order
- On failure: try next
- Log which encoder selected
- Always have software fallback
6. Configuration
struct GpuEncoderConfig {
preferred_encoder: Option<EncoderType>,
enable_ring_buffer: bool,
ring_buffer_size: usize, // frames
nvenc_preset: String, // p1-p7
nvenc_tune: String, // hq, ll, ull
}7. Integrate with Pipeline
- Replace current ffmpeg encoder stage
- Use encoder abstraction
- Ring buffer between network and encoder
- Async frame feeding
8. Metrics
encoder_type(label on all metrics)encoder_frames_totalencoder_fpsencoder_queue_depth(ring buffer)encoder_gpu_utilization(if available)
Performance Target
| Metric | Software Encoding | NVENC | Improvement |
|---|---|---|---|
| Encoding Time | 30-60 sec/job | 5-10 sec/job | 5-10x |
| Throughput/Worker | ~67 MB/s | ~400 MB/s | 6x |
Files to Create
src/pipeline/gpu/encoder.rssrc/pipeline/gpu/nvenc.rssrc/pipeline/gpu/ring_buffer.rs
Files to Modify
src/pipeline/gpu/mod.rssrc/dataset/kps/video_encoder.rs(optional: share abstraction)
Hardware Requirements
- NVIDIA GPU (Turing or newer recommended)
- NVIDIA drivers 450+
- CUDA runtime (optional for direct API)
- ffmpeg compiled with --enable-nvenc
Acceptance Criteria
- VideoEncoder trait defined
- NVENC backend works (via ffmpeg)
- Ring buffer smooths jitter
- Auto-detection selects best encoder
- Fallback to software works
- Configuration options work
- Performance improvement measured
- Works on GPU instances
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/pipelinePipeline processingPipeline processingpriority/mediumMedium priorityMedium prioritysize/LLarge: 1-2 weeksLarge: 1-2 weekstype/featureNew feature or functionalityNew feature or functionality