Agentic Web Scraper And Automation (LangGraph + Playwright + Ollama + RAG)

A local-first agentic web automation project that scrapes a website, builds a small retrieval knowledge base (Chroma), plans actions using an Ollama chat model, and executes the plan with Playwright. It is designed for simple tasks like navigating pages, filling forms, clicking buttons, and taking screenshots.

Features

Website scraping to collect page/form context
RAG over scraped content using Chroma + local embeddings
Action planning using a local LLM (Ollama)
Playwright-based execution: navigate, fill, click, wait, screenshot
Validation layer with heuristics (and optional LLM check)
Logs + screenshots for debugging and proof

Tech stack

Python
LangGraph (workflow orchestration)
LangChain Ollama (LLM interface)
Playwright (browser automation)
ChromaDB (vector database)
Sentence-Transformers / local embeddings (all-MiniLM-L6-v2)

Project structure (typical)

src/main.py: entry point
src/agent/: LangGraph agent (planner / executor / validator / graph)
src/browser/: Playwright adapter
src/scraper/: website scraper
src/knowledge_base/: embeddings + chroma store + retriever
requirements.txt: dependencies
.env: local secrets (do not commit)

Setup

1) Create and activate a virtual environment

Windows (PowerShell):

python -m venv venv
.\venv\Scripts\activate

macOS / Linux:

python3 -m venv venv
source venv/bin/activate

2) Install dependencies

pip install -r requirements.txt
playwright install chromium

3) Install and run Ollama

Install Ollama: https://ollama.com/

Pull a model (example):

ollama pull llama3.2

Configuration (`.env`)

Create a .env file in the project root (never commit this file).

Example:

# Ollama
CHAT_MODEL=llama3.2
OLLAMA_BASE_URL=http://localhost:

# Optional: LangSmith tracing (only if you use it)
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=YOUR_LANGSMITH_KEY
LANGCHAIN_PROJECT=agentic-web-automation

# Optional: tuning
BROWSER_TIMEOUT=30000
MAX_RETRIES=2

Notes:

If you do not use LangSmith, remove the LANGCHAIN_* variables or set tracing to false.

Run

From the project root:

python -m src.main

Using it on a new website

Change the domain and task in src/main.py (or wherever your entry point defines them).
Run once to scrape and rebuild the knowledge base for that domain.
Ensure the planner is dynamic (no hardcoded selectors) and the executor supports the actions produced by the planner.
For logins, ensure your task includes credentials clearly, for example:
- "Log in. Username is user123. Password is pass123. Then take a screenshot."

Important limitations:

CAPTCHA / Cloudflare checks are not supported.
2FA flows require extra handling (pause/wait for code entry).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
examples		examples
screenshots		screenshots
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Web Scraper And Automation (LangGraph + Playwright + Ollama + RAG)

Features

Tech stack

Project structure (typical)

Setup

1) Create and activate a virtual environment

2) Install dependencies

3) Install and run Ollama

Configuration (`.env`)

Run

Using it on a new website

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Web Scraper And Automation (LangGraph + Playwright + Ollama + RAG)

Features

Tech stack

Project structure (typical)

Setup

1) Create and activate a virtual environment

2) Install dependencies

3) Install and run Ollama

Configuration (.env)

Run

Using it on a new website

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration (`.env`)

Packages