Skip to content

[Phase 8] Add NVENC GPU video encoding [PARALLEL ⏳] #49

@zhexuany

Description

@zhexuany

Summary

Add NVIDIA NVENC hardware video encoding support for faster H.264 encoding. Includes ring buffer for smoothing I/O jitter.

Priority: MEDIUM ⏳

Performance Optimization - Can be done in parallel with deployment work.

Dependencies

Design

Encoder Abstraction

trait VideoEncoder {
    fn encode_frame(&mut self, frame: &[u8]) -> Option<EncodedChunk>;
    fn flush(&mut self) -> Vec<EncodedChunk>;
}

Backend Priority

  1. h264_nvenc (NVIDIA GPU)
  2. h264_qsv (Intel QuickSync)
  3. h264_videotoolbox (macOS)
  4. libx264 (software fallback)

Ring Buffer

  • GPU memory ring buffer
  • Producer: Pipeline pushes frames
  • Consumer: Encoder pulls frames
  • Smooths network I/O jitter

Tasks

1. Define Encoder Abstraction

  1. Create src/pipeline/gpu/encoder.rs
  2. Define VideoEncoder trait
  3. Define EncoderConfig:
    • codec, preset, crf, gop_size
    • pixel_format, resolution

2. Implement NVENC Backend

  1. Create src/pipeline/gpu/nvenc.rs
  2. Options:
    • Direct NVENC API (complex, best performance)
    • ffmpeg subprocess with -c:v h264_nvenc (simpler)
  3. Initial: Use ffmpeg subprocess
  4. Future: Direct API for lower latency

3. Implement Ring Buffer

  1. Create src/pipeline/gpu/ring_buffer.rs
  2. Fixed-size circular buffer
  3. CUDA pinned memory (optional)
  4. Block producer when full
  5. Block consumer when empty

4. Implement Auto-Detection

  1. detect_available_encoders() -> Vec<EncoderType>
  2. Check for:
    • nvidia-smi presence
    • ffmpeg encoder availability
    • CUDA runtime
  3. Select best available

5. Implement Fallback Chain

  1. Try encoders in priority order
  2. On failure: try next
  3. Log which encoder selected
  4. Always have software fallback

6. Configuration

struct GpuEncoderConfig {
    preferred_encoder: Option<EncoderType>,
    enable_ring_buffer: bool,
    ring_buffer_size: usize,  // frames
    nvenc_preset: String,     // p1-p7
    nvenc_tune: String,       // hq, ll, ull
}

7. Integrate with Pipeline

  1. Replace current ffmpeg encoder stage
  2. Use encoder abstraction
  3. Ring buffer between network and encoder
  4. Async frame feeding

8. Metrics

  • encoder_type (label on all metrics)
  • encoder_frames_total
  • encoder_fps
  • encoder_queue_depth (ring buffer)
  • encoder_gpu_utilization (if available)

Performance Target

Metric Software Encoding NVENC Improvement
Encoding Time 30-60 sec/job 5-10 sec/job 5-10x
Throughput/Worker ~67 MB/s ~400 MB/s 6x

Files to Create

  • src/pipeline/gpu/encoder.rs
  • src/pipeline/gpu/nvenc.rs
  • src/pipeline/gpu/ring_buffer.rs

Files to Modify

  • src/pipeline/gpu/mod.rs
  • src/dataset/kps/video_encoder.rs (optional: share abstraction)

Hardware Requirements

  • NVIDIA GPU (Turing or newer recommended)
  • NVIDIA drivers 450+
  • CUDA runtime (optional for direct API)
  • ffmpeg compiled with --enable-nvenc

Acceptance Criteria

  • VideoEncoder trait defined
  • NVENC backend works (via ffmpeg)
  • Ring buffer smooths jitter
  • Auto-detection selects best encoder
  • Fallback to software works
  • Configuration options work
  • Performance improvement measured
  • Works on GPU instances

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions