Skip to content

Evaluate search pipeline  #81

@dantetemplar

Description

@dantetemplar

Describe the feature

To evaluate the quality of the search, you need to collect or generate a dataset. Firstly, for Retrieval part - entry consisting query and relevant documents (also may contain negative examples). Also, we can assess not only retrieval part, but full RAG-pipeline - it will require dataset with question-answer pairs, and some LLM-judge.

  • Collect dataset with query and relevant documents. Also may contain subset of documents that will be involved in search run (to make score stable across different runs).
  • Calculate Offline metrics: HitRate@10, MeanAveragePrecision@10
  • Save metrics somewhere with description: may be in repository issues, releases, or discussions
  • Iterate on improving search pipeline and extending collected dataset

Suggested solution

Wiki: Relevance metrics
Weaviate: article about metrics
HuggingFace: RAG evaluation cookbook

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions