Paste a URL. Describe your research. Let AI do the rest.
CrawlMind combines Cloudflare's crawl infrastructure with AI-powered URL discovery and multi-hop research synthesis β turning any query into structured, crawled knowledge.
Getting Started Β· Features Β· Architecture Β· Deploy
- Smart Input β Auto-detects URLs vs. natural language; just paste or type
- Cloudflare-Powered β Fast, reliable crawling via Cloudflare's Browser Rendering API
- Multi-Format Output β Markdown, HTML, plaintext, or cleaned readable HTML
- JS Rendering β Crawl JavaScript-heavy SPAs with headless rendering
- Advanced Controls β Depth, page limits, subdomain inclusion, URL patterns, date filters
- AI URL Discovery β Describe what you need; Groq finds the best sources to crawl
- Depth Tiers β Quick (~30s), Deep Dive (~2min), or Multi-hop Research (~5min)
- Multi-Hop Research β Crawl β analyze gaps β discover follow-up sources β repeat (up to 3 rounds)
- AI Synthesis β NVIDIA NIM generates a comprehensive research report from all crawled data
- Parent-Child Jobs β Research jobs manage multiple sub-crawls independently, no interference with normal crawls
- AI Chat β Ask questions about crawl results with full context awareness
- Soft-Delete Library β Archive, restore, and manage past crawls
- Analytics Dashboard β Track crawl usage, search patterns, and AI queries
- Plan-Based Limits β Tiered pricing with Stripe integration
- Auth β GitHub, Google, and email sign-in via Better Auth
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INPUT β
β URL / Natural Language / AI Discovery Toggle β
βββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββ
β β
URL detected AI Discovery ON
β β
βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββββββββββ
β POST /api/crawl β β POST /api/research β
β Normal Pipeline β β AI Research Pipeline β
ββββββββββ¬βββββββββ ββββββββββββββ¬ββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββββββββββ
β Cloudflare Crawl β β Groq: Discover URLs β
β Single Job β β (llama-3.3-70b-versatile)β
ββββββββββ¬βββββββββ ββββββββββββββ¬ββββββββββββββ
β β
β βΌ
β ββββββββββββββββββββββββββββ
β β Spawn Parallel Sub-Crawls β
β β via Cloudflare Crawl API β
β ββββββββββββββ¬ββββββββββββββ
β β
β ββββββββββββββΌββββββββββββββ
β β RESEARCH tier only: β
β β NIM Gap Analysis β β
β β Follow-up Crawls (Γ3) β
β ββββββββββββββ¬ββββββββββββββ
β β
β βΌ
β ββββββββββββββββββββββββββββ
β β NIM: Synthesis Report β
β β (nemotron-super-49b) β
βΌ ββββββββββββββ¬ββββββββββββββ
βββββββββββββββββββ β
β Neon PostgreSQL βββββββββββββββββββ
β (Prisma ORM) β
βββββββββββββββββββ
| Layer | Technology | Purpose |
|---|---|---|
| Framework | Next.js 15 (App Router) | Full-stack React with server components |
| Database | Neon PostgreSQL + Prisma | Serverless Postgres with type-safe ORM |
| Auth | Better Auth | GitHub, Google, email authentication |
| Crawling | Cloudflare Crawl API | Browser rendering + web crawling at scale |
| AI β Fast | Groq (llama-3.3-70b) |
URL discovery (~200ms responses) |
| AI β Deep | NVIDIA NIM (nemotron-super-49b) |
Gap analysis + synthesis reports |
| AI Chat | Vercel AI SDK | Streaming chat over crawl results |
| Payments | Stripe | Subscription billing + webhooks |
| Styling | Tailwind CSS + shadcn/ui | Utility-first CSS + accessible components |
| Deployment | Vercel | Edge-optimized serverless hosting |
- Bun v1.0+
- Neon PostgreSQL database
- Cloudflare account with Crawl API access
- Groq API key (for AI URL discovery)
- NVIDIA NIM API key (for synthesis)
# Clone
git clone https://github.com/pantha704/CrawlMind.git
cd CrawlMind
# Install
bun install
# Configure
cp .env.example .env.local
# Edit .env.local with your keys (see below)
# Database
bunx prisma db push
bunx prisma generate
# Run
bun run dev# Database (Neon)
DATABASE_URL=postgresql://...
# Auth
BETTER_AUTH_SECRET=your-secret
BETTER_AUTH_URL=http://localhost:3001
GITHUB_CLIENT_ID=...
GITHUB_CLIENT_SECRET=...
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
# Cloudflare
CLOUDFLARE_API_TOKEN=...
CLOUDFLARE_ACCOUNT_ID=...
# AI
GROQ_API_KEY=... # For URL discovery (Groq)
NVIDIA_NIM_API_KEY=... # For synthesis (NVIDIA NIM)
# Stripe
STRIPE_SECRET_KEY=...
STRIPE_WEBHOOK_SECRET=...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=...
# App
NEXT_PUBLIC_APP_URL=http://localhost:3001src/
βββ app/
β βββ api/
β β βββ crawl/ # Crawl CRUD, results proxy, cancel
β β βββ research/ # AI Discovery β create, poll, active
β β βββ chat/ # AI chat endpoint
β β βββ stripe/ # Payment webhooks
β β βββ user/ # Usage tracking & settings
β βββ dashboard/
β β βββ page.tsx # Main dashboard
β β βββ jobs/ # Crawl job list + detail
β β βββ research/ # AI research detail page
β β βββ chat/ # AI chat interface
β β βββ library/ # Archived results
β β βββ analytics/ # Usage analytics
β βββ pricing/ # Pricing page
β βββ (auth)/ # Sign in / sign up
βββ components/
β βββ dashboard/ # Dashboard UI (crawl-input, active-jobs, etc.)
β βββ landing/ # Landing page components
β βββ ui/ # shadcn/ui primitives
βββ lib/
βββ auth.ts # Better Auth config
βββ cloudflare.ts # Cloudflare Crawl API client
βββ research.ts # AI Discovery β Groq + NIM integration
βββ ai.ts # AI model configuration
βββ prisma.ts # Prisma client
βββ stripe.ts # Stripe client
| Tier | What Happens | Sources | Time |
|---|---|---|---|
| β‘ Quick | AI finds 3-5 relevant sources, crawls them | 3-5 | ~30s |
| π Deep Dive | AI discovers 10-15 categorized sources | 10-15 | ~2min |
| π§ Research | Multi-hop: crawl β gap analysis β follow-up crawls (Γ3 rounds) β synthesis | 15-30+ | ~5min |
Models used:
- Groq (
llama-3.3-70b-versatile) β Fast URL discovery (~200ms) - NVIDIA NIM (
nemotron-super-49b-v1.5) β Deep analysis & comprehensive synthesis
| Plan | Price | Crawls/day | Pages/crawl | AI Chat | JS Render |
|---|---|---|---|---|---|
| Spark | Free | 2 | 30 | 3 queries | β |
| Pro | $12/mo | 25 | 500 | Unlimited | β |
| Pro+ | $24/mo | 75 | 1,000 | Unlimited | β |
| Scale | $39/mo | 150 | 5,000 | Unlimited | β |
- Push to GitHub
- Import in Vercel
- Add all environment variables
- Set
NEXT_PUBLIC_APP_URLto your Vercel domain - Deploy
Note: Ensure
NEXT_PUBLIC_APP_URLpoints to your deployed domain (notlocalhost) for webhooks and auth callbacks.
MIT β see LICENSE for details.
Built with β and curiosity