add Melting Pot, DomainBed

zhimin-z · web-flow · commit df1330f7c047 · 2025-03-06T18:09:54.000+08:00
diff --git a/README.md b/README.md
@@ -305,6 +305,7 @@ Please review our [CONTRIBUTING.md](https://github.com/EthicalML/awesome-product
 * [COMET](https://github.com/Unbabel/COMET) ![](https://img.shields.io/github/stars/Unbabel/COMET.svg?style=social) - COMET is an open-source framework for machine learning evaluation.
 * [Deepchecks](https://github.com/deepchecks/deepchecks) ![](https://img.shields.io/github/stars/deepchecks/deepchecks.svg?style=social) - Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling you to test your data and models from research to production thoroughly.
 * [DeepEval](https://github.com/confident-ai/deepeval) ![](https://img.shields.io/github/stars/confident-ai/deepeval.svg?style=social) - DeepEval is a simple-to-use, open-source evaluation framework for LLM applications.
+* [DomainBed](https://github.com/facebookresearch/DomainBed) ![](https://img.shields.io/github/stars/facebookresearch/DomainBed.svg?style=social) - DomainBed is a test suite containing benchmark datasets and algorithms for domain generalization
 * [EvalAI](https://github.com/Cloud-CV/EvalAI) ![](https://img.shields.io/github/stars/Cloud-CV/EvalAI.svg?style=social) - EvalAI is an open-source platform for evaluating and comparing AI algorithms at scale.
 * [EvalPlus](https://github.com/evalplus/evalplus) ![](https://img.shields.io/github/stars/evalplus/evalplus.svg?style=social) - EvalPlus is a robust evaluation framework for LLM4Code, featuring expanded HumanEval+ and MBPP+ benchmarks, efficiency assessment (EvalPerf), and a secure, extensible evaluation toolkit.
 * [Evals](https://github.com/openai/evals) ![](https://img.shields.io/github/stars/openai/evals.svg?style=social) - Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.
@@ -328,6 +329,7 @@ Please review our [CONTRIBUTING.md](https://github.com/EthicalML/awesome-product
 * [LLMonitor](https://github.com/lunary-ai/lunary) ![](https://img.shields.io/github/stars/lunary-ai/lunary.svg?style=social) - LLMonitor is an observability & analytics for AI apps and agents.
 * [LLMPerf](https://github.com/ray-project/llmperf) ![](https://img.shields.io/github/stars/ray-project/llmperf.svg?style=social) - LLMPerf is a tool for evaluating the performance of LLM APIs.
 * [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) ![](https://img.shields.io/github/stars/EvolvingLMMs-Lab/lmms-eval.svg?style=social) - lmms-eval is an evaluation framework meticulously crafted for consistent and efficient evaluation of LMM.
+* [Melting Pot](https://github.com/google-deepmind/meltingpot) ![](https://img.shields.io/github/stars/google-deepmind/meltingpot.svg?style=social) - Melting Pot is a suite of test scenarios for multi-agent reinforcement learning.
 * [Meta-World](https://github.com/Farama-Foundation/Metaworld) ![](https://img.shields.io/github/stars/Farama-Foundation/Metaworld.svg?style=social) - Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks.
 * [mir_eval](https://github.com/mir-evaluation/mir_eval) ![](https://img.shields.io/github/stars/mir-evaluation/mir_eval.svg?style=social) - mir_eval is a Python library which provides a transparent, standardized, and straightforward way to evaluate Music Information Retrieval systems.
 * [MLPerf Inference](https://github.com/mlcommons/inference) ![](https://img.shields.io/github/stars/mlcommons/inference.svg?style=social) - MLPerf Inference is a benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios.