Skip to content

Conversation

@justincasher
Copy link
Owner

Summary

  • Reduced default GPU batch_size from 128 to 16 for the reranker
  • Fixes CUDA OOM errors on GPUs with 8GB VRAM (like RTX 3070) when reranking 50 documents
  • With batch_size=128, all documents were processed in a single forward pass since 50 <= 128

🤖 Generated with Claude Code

The default batch_size of 128 caused CUDA OOM errors on GPUs with
8GB VRAM (like RTX 3070) when reranking 50 documents. Reduced to 16
so batching actually kicks in instead of processing all documents
in a single forward pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@justincasher justincasher merged commit f5d4ccc into main Jan 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants