Advanced Research Agent is a Python-based autonomous research system that uses Large Language Models (LLMs), the Model Context Protocol (MCP), and Firecrawl to perform intelligent web research.
The project demonstrates the evolution from a simple interactive agent into a structured, workflow-driven research agent capable of crawling websites, extracting information, and returning structured research results.
- Build a practical autonomous research agent
- Integrate Firecrawl MCP tools for real-world web data extraction
- Apply ReAct-style LLM reasoning for tool usage
- Transition from a single-agent prototype to a scalable workflow architecture
- Produce structured, comparable research outputs
The system operates in two stages:
The initial agent is a single ReAct-based LLM agent that:
- Connects to Firecrawl via MCP
- Dynamically loads available crawling and scraping tools
- Uses step-by-step reasoning to decide which tools to call
- Runs in an interactive command-line interface
This stage focuses on experimentation, tool discovery, and validating MCP-based tool execution.
The advanced agent introduces a Workflow layer that orchestrates research execution.
Key responsibilities of the workflow include:
- Accepting user research queries
- Guiding the LLM through structured research steps
- Executing Firecrawl-powered crawling and extraction
- Normalizing raw data into structured entities
- Returning organized research results (e.g., companies and tools)
This design separates reasoning, execution, and output processing, making the system easier to extend and maintain.
At a high level, the system consists of:
- A CLI-based user interface
- A workflow orchestration layer
- An LLM reasoning agent (LangGraph ReAct)
- MCP client communication
- Firecrawl MCP server for web crawling and scraping
- Result processing and structured output
- Sequence diagram illustrates the runtime behavior of the Advanced Research Agent. After the user submits a query through the CLI, the workflow orchestrates the research task by delegating reasoning to the LLM agent. The agent dynamically interacts with Firecrawl through the MCP client to crawl and extract web data. The extracted information is then processed and returned as structured research results.
- Workflow diagram represents the internal states of the Advanced Research Agent workflow. The agent transitions from idle to active research states as it plans, executes, and synthesizes information. Error states allow the system to safely recover and return to an idle state, ensuring robustness and repeatable execution for multiple research queries.
- Python 3.10+
- LangChain
- LangGraph
- Model Context Protocol (MCP)
- Firecrawl API
- OpenAI GPT-4o-mini
- AsyncIO
- python-dotenv
.
├── simple_agent.py # Prototype MCP research agent
├── main.py # Advanced research agent entry point
├── src/
│ └── workflow.py # Research workflow orchestration
├── .env.example
├── requirements.txt
└── README.md
Clone the repository:
- git clone https://github.com/IT21314742/Advanced-Research-Agent.git
- cd Advanced-Research-Agent
pip install -r requirements.txt
OPENAI_API_KEY=your_openai_key
FIRECRAWL_API_KEY=your_firecrawl_key
python simple_agent.py
- Interactive research session
- Direct tool usage via MCP
- Ideal for experimentation
python main.py
You will be prompted to enter a research query, for example:
AI developer tools for code review
- Tool or company name
- Official website
- Pricing model
- Open-source availability
- Technology stack (when available)
- ExampleTool
- Website: https://example.com
- Pricing: Freemium
- Open Source: No
- Tech Stack: Python, FastAPI, React
-
Adding additional workflow stages
-
Supporting new research domains
-
Exporting results to JSON, CSV, or Markdown
-
Adding memory or caching layers
-
Parallelizing research tasks
-
Persistent memory
-
Citation and source tracking
-
Parallel research execution
-
Report generation
-
Web or API interface
- Contributions, issues, and ideas are welcome. This project is intended for developers interested in LLM agents, autonomous research systems, and MCP-based tooling.


