This project provides a conversational agent powered by LangChain, LangGraph, and OpenAI that leverages BrightData's web scraping and data extraction tools. You can interact with the agent in your terminal to perform various web data tasks.
- Conversational Interface: Interact with web scraping tools using natural language.
- Powered by LangChain & OpenAI: Utilizes the robust ReAct agent framework from LangChain and state-of-the-art models from OpenAI.
- BrightData Integration: Seamlessly connects to the BrightData MCP (Multi-purpose Crawler Platform) to access a wide range of data extraction tools.
- Secure: Manages API keys and sensitive credentials using environment variables.
- Python 3.9+
- Node.js and npm
- A BrightData account and API credentials
- An OpenAI API key
-
Clone the repository:
git clone https://github.com/mverab/BrightData-painextractor.git cd BrightData-painextractor -
Install Node.js dependencies: The agent uses
npxto run the BrightData MCP client.npm install
-
Set up a Python virtual environment:
python3 -m venv venv source venv/bin/activate -
Install Python dependencies:
pip install -r requirements.txt
-
Configure your environment variables: Create a
.envfile in the root of the project by copying the example:cp .env.example .env
Now, edit the
.envfile and add your credentials:# .env OPENAI_API_KEY="your_openai_api_key" # BrightData Credentials API_TOKEN="your_brightdata_api_token" WEB_UNLOCKER_ZONE="your_web_unlocker_zone" BROWSER_ZONE="your_browser_zone"
To start the conversational agent, run the main.py script:
python main.pyOnce the agent is connected, you can start making requests. Here are a few examples:
Extract specs for Amazon ASIN B07NJG12GBScrape product data from amazon.mx for laptopsGet the latest news from reuters.com on technology
To exit the agent, type exit or quit.
The agent uses a StdioServer to communicate with the BrightData MCP client, which is a Node.js process. main.py orchestrates the setup and runs a langgraph agent that can decide which BrightData tool to use based on the user's prompt.
This project is licensed under the MIT License. See the LICENSE file for details.
