[Phase 8] Add NVENC GPU video encoding [PARALLEL ⏳]

## Summary

Add NVIDIA NVENC hardware video encoding support for faster H.264 encoding. Includes ring buffer for smoothing I/O jitter.

## Priority: MEDIUM ⏳

**Performance Optimization** - Can be done in parallel with deployment work.

## Dependencies

- ✅ Pipeline Integration (#72, #73) - COMPLETE
- Optional enhancement (fallback to software encoding)

## Design

### Encoder Abstraction

```rust
trait VideoEncoder {
    fn encode_frame(&mut self, frame: &[u8]) -> Option<EncodedChunk>;
    fn flush(&mut self) -> Vec<EncodedChunk>;
}
```

### Backend Priority

1. h264_nvenc (NVIDIA GPU)
2. h264_qsv (Intel QuickSync)
3. h264_videotoolbox (macOS)
4. libx264 (software fallback)

### Ring Buffer

- GPU memory ring buffer
- Producer: Pipeline pushes frames
- Consumer: Encoder pulls frames
- Smooths network I/O jitter

## Tasks

### 1. Define Encoder Abstraction

1. Create `src/pipeline/gpu/encoder.rs`
2. Define `VideoEncoder` trait
3. Define `EncoderConfig`:
   - codec, preset, crf, gop_size
   - pixel_format, resolution

### 2. Implement NVENC Backend

1. Create `src/pipeline/gpu/nvenc.rs`
2. Options:
   - Direct NVENC API (complex, best performance)
   - ffmpeg subprocess with -c:v h264_nvenc (simpler)
3. Initial: Use ffmpeg subprocess
4. Future: Direct API for lower latency

### 3. Implement Ring Buffer

1. Create `src/pipeline/gpu/ring_buffer.rs`
2. Fixed-size circular buffer
3. CUDA pinned memory (optional)
4. Block producer when full
5. Block consumer when empty

### 4. Implement Auto-Detection

1. `detect_available_encoders() -> Vec<EncoderType>`
2. Check for:
   - nvidia-smi presence
   - ffmpeg encoder availability
   - CUDA runtime
3. Select best available

### 5. Implement Fallback Chain

1. Try encoders in priority order
2. On failure: try next
3. Log which encoder selected
4. Always have software fallback

### 6. Configuration

```rust
struct GpuEncoderConfig {
    preferred_encoder: Option<EncoderType>,
    enable_ring_buffer: bool,
    ring_buffer_size: usize,  // frames
    nvenc_preset: String,     // p1-p7
    nvenc_tune: String,       // hq, ll, ull
}
```

### 7. Integrate with Pipeline

1. Replace current ffmpeg encoder stage
2. Use encoder abstraction
3. Ring buffer between network and encoder
4. Async frame feeding

### 8. Metrics

- `encoder_type` (label on all metrics)
- `encoder_frames_total`
- `encoder_fps`
- `encoder_queue_depth` (ring buffer)
- `encoder_gpu_utilization` (if available)

## Performance Target

| Metric | Software Encoding | NVENC | Improvement |
|--------|------------------|-------|-------------|
| Encoding Time | 30-60 sec/job | 5-10 sec/job | 5-10x |
| Throughput/Worker | ~67 MB/s | ~400 MB/s | 6x |

## Files to Create

- `src/pipeline/gpu/encoder.rs`
- `src/pipeline/gpu/nvenc.rs`
- `src/pipeline/gpu/ring_buffer.rs`

## Files to Modify

- `src/pipeline/gpu/mod.rs`
- `src/dataset/kps/video_encoder.rs` (optional: share abstraction)

## Hardware Requirements

- NVIDIA GPU (Turing or newer recommended)
- NVIDIA drivers 450+
- CUDA runtime (optional for direct API)
- ffmpeg compiled with --enable-nvenc

## Acceptance Criteria

- [ ] VideoEncoder trait defined
- [ ] NVENC backend works (via ffmpeg)
- [ ] Ring buffer smooths jitter
- [ ] Auto-detection selects best encoder
- [ ] Fallback to software works
- [ ] Configuration options work
- [ ] Performance improvement measured
- [ ] Works on GPU instances

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Phase 8] Add NVENC GPU video encoding [PARALLEL ⏳] #49

Summary

Priority: MEDIUM ⏳

Dependencies

Design

Encoder Abstraction

Backend Priority

Ring Buffer

Tasks

1. Define Encoder Abstraction

2. Implement NVENC Backend

3. Implement Ring Buffer

4. Implement Auto-Detection

5. Implement Fallback Chain

6. Configuration

7. Integrate with Pipeline

8. Metrics

Performance Target

Files to Create

Files to Modify

Hardware Requirements

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Software Encoding	NVENC	Improvement
Encoding Time	30-60 sec/job	5-10 sec/job	5-10x
Throughput/Worker	~67 MB/s	~400 MB/s	6x

[Phase 8] Add NVENC GPU video encoding [PARALLEL ⏳] #49

Description

Summary

Priority: MEDIUM ⏳

Dependencies

Design

Encoder Abstraction

Backend Priority

Ring Buffer

Tasks

1. Define Encoder Abstraction

2. Implement NVENC Backend

3. Implement Ring Buffer

4. Implement Auto-Detection

5. Implement Fallback Chain

6. Configuration

7. Integrate with Pipeline

8. Metrics

Performance Target

Files to Create

Files to Modify

Hardware Requirements

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions