Skip to content

chheplo/llm-agent-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Agent Scraper

Automated Google search → visit results → extract cleaned page content locally.

Powered by Playwright for browsing and llm-scraper for LLM-guided extraction. Choose provider (OpenAI, Anthropic, Google) per run.

Features

  • Google search via Playwright, filters ads/internal links
  • Iterates top N organic results
  • Uses llm-scraper to extract title/description/main content
  • Saves JSON under output/
  • Local web UI to pick provider, key, query, headless

Setup

npm install
npm run playwright:install

CLI usage

# export the matching API key env for chosen provider
export PROVIDER=openai
export OPENAI_API_KEY=sk-...

npm start -- "best laptops 2025"

# Limit links and run headed
MAX_LINKS=3 HEADLESS=false npm start -- "vector databases"

Web UI

npm run serve
# open http://localhost:3000

Pick provider, paste API key, enter query, press Start. Results display on the page and persist to output/.

Notes

  • Respect websites' terms and robots.
  • This is for research/consentful scraping.
  • Playwright default is headless; toggle in UI.

Acknowledgments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published