An interactive web-based leaderboard for evaluating Large Language Models (LLMs) on the FABLE benchmark - a comprehensive data-flow analysis benchmark using procedural text.
FABLE measures LLMs' ability to perform data flow analysis tasks across multiple domains: travel routes, plans, and recipes. This leaderboard provides an interactive interface to explore model performance across different analysis types and domains.
This leaderboard is based on the research presented in:
FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this leaderboard or the FABLE benchmark in your research, please cite:
@article{pallagani2025fable,
title={FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation},
author={Pallagani, Vishal and Gupta, Nitin and Aydin, John and Srivastava, Biplav},
journal={arXiv preprint arXiv:2505.24258},
year={2025}
}