Skip to content

Hybrid search pipeline with OpenAI embeddings and Cohere reranking#15

Open
Chaithra-S-Nayak-2 wants to merge 2 commits intomainfrom
hybrid-search-pipeline
Open

Hybrid search pipeline with OpenAI embeddings and Cohere reranking#15
Chaithra-S-Nayak-2 wants to merge 2 commits intomainfrom
hybrid-search-pipeline

Conversation

@Chaithra-S-Nayak-2
Copy link
Copy Markdown
Member

Overview

Hybrid retrieval pipeline for the DevRev Search challenge.

Pipeline

Dense (OpenAI text-embedding-3-large, 3072-dim) + BM25 + HyDE + Multi-Query → Weighted RRF (BM25 2x weight) → Article-sibling expansion → Cohere Rerank v3.5 (top-200 candidates)

System Details

  • Embedding: OpenAI text-embedding-3-large (3072-dim) via LiteLLM proxy
  • Sparse Retrieval: BM25 Okapi
  • Query Expansion: HyDE + Multi-Query rewriting (Gemini 2.5 Flash)
  • Fusion: Weighted Reciprocal Rank Fusion
  • Reranker: Cohere Rerank v3.5
  • Vector Index: FAISS IndexFlatIP
  • Open Source: No (embeddings and reranker are closed-source APIs)

Work item: https://app.devrev.ai/devrev/works/ISS-269621/

- Introduced `embed_corpus.py` for generating document embeddings using OpenAI's text-embedding-3-large model.
- Added `hybrid_search.py` for implementing a hybrid search pipeline combining BM25 and dense embeddings.
- Updated `.gitignore` to exclude additional cache and binary files.
- Included new dependencies in `requirements.txt` for rank-bm25 and cohere integration.
@Chaithra-S-Nayak-2 Chaithra-S-Nayak-2 marked this pull request as ready for review April 8, 2026 04:41
@prakhar7651
Copy link
Copy Markdown
Contributor

Hey! These are your scores.
Recall: 0.2932
Precision: 0.2293

@Chaithra-S-Nayak-2
Copy link
Copy Markdown
Member Author

Hey! These are your scores. Recall: 0.2932 Precision: 0.2293

Thank you!

@nimit2801
Copy link
Copy Markdown
Collaborator

hey @Chaithra-S-Nayak-2 can you please share your linkedIn profile we're going to post about winners soon!

@Chaithra-S-Nayak-2
Copy link
Copy Markdown
Member Author

hey @Chaithra-S-Nayak-2 can you please share your linkedIn profile we're going to post about winners soon!

sure, here is my linkedIn profile : https://www.linkedin.com/in/chaithra-s-nayak/

@nimit2801
Copy link
Copy Markdown
Collaborator

hey @Chaithra-S-Nayak-2 can you please share your linkedIn profile we're going to post about winners soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants