Agent Benchmark Suite

A comprehensive benchmark suite to evaluate multi-agent AI frameworks across collaborative task scenarios. This project compares the performance of popular frameworks such as MetaGPT, CrewAI, and LangGraph using models served via Ollama.

Overview

This benchmark evaluates the capabilities of different multi-agent orchestration frameworks in executing structured collaborative tasks. Key metrics include:

Task planning and decomposition
Collaboration fluidity between agents
Output structure and readability
Originality and insight
Execution flow and consistency

Benchmark Setup

Hardware: RTX 3060 (12GB VRAM), 64GB RAM, Ubuntu
Frameworks Tested: MetaGPT, CrewAI, LangGraph
Models Used: Served via Ollama
Evaluation Criteria:
- Scored on a 1.0–5.0 scale per dimension with decimal precision
- Manual review of structure, flow, and collaboration

Summary of Results

Framework	Score (Avg)	Highlights
MetaGPT	⭐ Highest	Excellent planning, modularity, and collaboration flow
CrewAI	High	Fast and readable with solid performance
LangGraph	Moderate	Accurate and consistent, slightly rigid in tone

Note: Framework performance can vary by task type and prompt design. Always validate against your use case.

Key Takeaways

MetaGPT is ideal for modular, logic-heavy workflows
CrewAI is great for rapid prototyping and simple chains
LangGraph excels in complex flow and stateful task orchestration
Model choice significantly influences quality and coherence

How to Run

git clone https://github.com/yourusername/agent-benchmark-suite.git
cd agent-benchmark-suite

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
crewai		crewai
langgraph		langgraph
metagpt		metagpt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Benchmark Suite

Overview

Benchmark Setup

Summary of Results

Key Takeaways

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Benchmark Suite

Overview

Benchmark Setup

Summary of Results

Key Takeaways

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages