Moulinette

Evaluation system for RAG submissions. Validates your search results and calculates recall metrics.

Installation

uv venv && uv sync

direnv allow  # or source .envrc

CLI Commands

evaluate_student_search_results

Evaluate your search results against the ground truth dataset.

uv run python -m moulinette evaluate_student_search_results \

    <student_results_path> \

    <dataset_path> \

    [--k K] \

    [--max_context_length MAX_LENGTH]

Arguments:

|----------|------|---------|-------------|

| --k | int | 10 | Maximum number of sources per question |

| --max_context_length | int | 2000 | Maximum context length per source |

Example:

uv run python -m moulinette evaluate_student_search_results \

    ../data/output/search_results/dataset_code_public.json \

    ../data/datasets/AnsweredQuestions/dataset_code_public.json \

    --k 10 \

    --max_context_length 2000

Output:

Validates student data format
Calculates Recall@1, Recall@3, Recall@5, Recall@10
Returns True if Recall@5 >= 50%, False otherwise

evaluate_student_answers

Evaluate your generated answers (not yet implemented).

uv run python -m moulinette evaluate_student_answers <student_answer_path>

Pass Criteria

| Dataset | Metric | Threshold |

|---------|--------|-----------|

| Code | Recall@5 | >= 50% |

| Docs | Recall@5 | >= 80% |

Input File Formats

Your Search Results (`student_results_path`)

{

  "search_results": [

    {

      "question_id": "uuid",

      "question_str": "What is the question text?",

      "retrieved_sources": [

        {

          "file_path": "path/to/file",

          "first_character_index": 0,

          "last_character_index": 500

        }

      ]

    }

  ],

  "k": 10

}

Ground Truth Dataset (`dataset_path`)

{

  "rag_questions": [

    {

      "question_id": "uuid",

      "question": "What is...",

      "answer": "The answer is...",

      "sources": [

        {

          "file_path": "path/to/file",

          "first_character_index": 0,

          "last_character_index": 500

        }

      ]

    }

  ]

}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
datasets_public/public		datasets_public/public
moulinette_pkg		moulinette_pkg
src		src
vllm-0.10.1		vllm-0.10.1
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
output.json		output.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moulinette

Installation

CLI Commands

evaluate_student_search_results

evaluate_student_answers

Pass Criteria

Input File Formats

Your Search Results (`student_results_path`)

Ground Truth Dataset (`dataset_path`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Moulinette

Installation

CLI Commands

evaluate_student_search_results

evaluate_student_answers

Pass Criteria

Input File Formats

Your Search Results (student_results_path)

Ground Truth Dataset (dataset_path)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Your Search Results (`student_results_path`)

Ground Truth Dataset (`dataset_path`)

Packages