Automated llms.txt Generator
An automated llms.txt generator that crawls websites, extracts key pages and metadata, and generates spec-compliant llms.txt files.
This is a monorepo containing two services:
| Service | Description | Deployment |
|---|---|---|
profound-frontend/ |
Next.js web app with dashboard and API | Vercel |
crawler-service/ |
Playwright-based crawler microservice | Railway |
┌─────────────────┐ ┌──────────────────┐
│ │ HTTP │ │
│ Vercel │────────▶│ Railway │
│ (Frontend) │ │ (Crawler) │
│ │ │ │
└────────┬────────┘ └────────┬─────────┘
│ │
│ Prisma │ Prisma
▼ ▼
┌───────────────────────────────────────────────┐
│ Supabase (PostgreSQL) │
└───────────────────────────────────────────────┘
git clone https://github.com/anikrish05/ProfoundTakeHome.git
cd ProfoundTakeHomecd crawler-service
npm install
npx playwright install chromium
cp .env.example .env # Configure DATABASE_URL and API_KEY
npm run devcd profound-frontend
npm install
cp .env.example .env # Configure your environment variables
npx prisma generate
npm run dev- Frontend README - Detailed frontend documentation
- Crawler Service README - Crawler service API docs
- Public Generation: Generate llms.txt for any website without signing up
- Site Management: Add and manage multiple websites to monitor
- Automated Crawling: Scheduled crawls detect changes and regenerate automatically
- Bot Protection Bypass: Playwright-based crawler handles JavaScript-heavy and protected sites
- Spec Compliant: Output follows the llmstxt.org standard
- Frontend: Next.js 15, React 19, Tailwind CSS, shadcn/ui
- Database: PostgreSQL with Prisma ORM
- Auth: Supabase Auth
- Crawler: Playwright, Express.js, Cheerio
- Deployment: Vercel + Railway
Click to watch a short walkthrough of the project:
Automatically generates a compliant
llms.txtfile from a given website in seconds.
