Robustness Tree Code

This toolkit processes MMLU questions, merges them with DOVE robustness scores, and generates model weakness profiles for deeper performance analysis.
It is designed to work alongside the EvalTree framework, replacing accuracy-based rankings with robustness-based rankings.

Python Files Flow

extract_mmlu_questions.py
Extracts questions from MMLU and produces a JSON file mapping each MMLU question to its index.
extract_dove_scores_Llama.py, extract_dove_scores_OLMoE.py
Extract DOVE robustness scores for LLaMA and OLMoE models.
merge_dove_score_with_mmlu_accuracy_score.py
Combines a model’s DOVE scores with the corresponding MMLU question indices.
replace_accuracy_ranking_in_DOVE.py
Updates EvalTree rankings by replacing accuracy scores with DOVE robustness scores.
weakness_question_generator.py
Produces a weakness profile summarizing model performance across different question types.

Folder Structure

EvalTree/
The original EvalTree repository (unmodified baseline).
Question Generation/
Scripts for generating and evaluating new questions based on the generated weakness profile.
Replace Accuracy For DOVE Ranking/
Modified EvalTree trees where accuracy scores are replaced with DOVE robustness scores.
plots/
Generated visualizations illustrating the results.
data/
Contains the trees and tables used in analysis, including input data for processing and evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robustness Tree Code

Python Files Flow

Folder Structure

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.idea		.idea
EvalTree		EvalTree
Question_generation		Question_generation
Replace Accuracy For DOVE Ranking		Replace Accuracy For DOVE Ranking
data		data
plots		plots
README.md		README.md
convert_disagreement_questions.py		convert_disagreement_questions.py
extract_continuous_weaknesses.py		extract_continuous_weaknesses.py
extract_dove_scores_Llama.py		extract_dove_scores_Llama.py
extract_dove_scores_OLMoE.py		extract_dove_scores_OLMoE.py
extract_mmlu_questions.py		extract_mmlu_questions.py
merge_dove_score_with_mmlu_accuracy_score.py		merge_dove_score_with_mmlu_accuracy_score.py
replace_accuracy_ranking_in_DOVE.py		replace_accuracy_ranking_in_DOVE.py
requirements.txt		requirements.txt

HadarBit/RobustnessTree

Folders and files

Latest commit

History

Repository files navigation

Robustness Tree Code

Python Files Flow

Folder Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages