Multi-Tenant Data Ingestion Infrastructure for AI Agents
Crow is a B2B infrastructure platform that enables SaaS companies to build secure, multi-tenant data pipelines for their AI agents. It solves the critical problem of authenticated web scraping and document extraction by providing a secure session vault, visual workflow builder, and enterprise-grade access controls.
Every SaaS is building agents, but those agents die at the login screen or get confused by a PDF table. Crow gives them the infrastructure to handle their customers' messy web and document data.
Store and manage encrypted authentication sessions for your tenants. Enable AI agents to scrape behind login walls without exposing credentials.
- Encrypted cookie storage - Sessions are encrypted at rest
- Expiration tracking - Auto-detect cookie expiration and receive alerts
- Rate limiting - Per-session hourly/daily limits to avoid detection
- Browser extension - One-click session capture from any website
Build complex data extraction pipelines without code using a step-based workflow designer.
- Web Scrape - Extract structured data from a single page
- Web Crawl - Crawl multiple pages from a starting URL
- Site Map - Discover all URLs on a website
- AI Agent (FIRE-1) - Multi-page agentic extraction for complex flows
- Document Extract - Extract from PDFs and images using Reducto
- Webhook - POST extracted data to any endpoint
- Email - Send results via email using Resend
Define reusable extraction schemas with typed fields:
- String, Number, Date, Boolean field types
- Schema versioning and activation controls
- Shared across workflows for consistency
Enterprise-ready multi-tenancy with granular access controls:
- Organizations - Top-level grouping for SaaS companies
- Tenants - Individual customers of your SaaS
- Members - Invite team members with role-based permissions
- Row Level Security - Postgres RLS ensures data isolation
Programmatic access for your AI agents:
- Per-tenant API keys for secure agent authentication
- RESTful endpoints for all extraction operations
- Webhook integrations for real-time data delivery
Capture authenticated sessions with one click:
- Automatically detects cookies on any site
- Push sessions directly to your tenant's vault
- Manage existing sessions from the extension
| Layer | Technology |
|---|---|
| Runtime | Bun |
| Framework | Next.js 16 (App Router) |
| Database | Supabase (Postgres + RLS) |
| Auth | Supabase Auth (GitHub OAuth) |
| UI | shadcn/ui + Tailwind CSS v4 |
| Forms | React Hook Form + Zod validation |
| Web Scraping | Firecrawl |
| Document OCR | Reducto |
| Resend | |
| Linting | Biome |
βββ app/
β βββ actions.ts # Server actions for mutations
β βββ page.tsx # Landing page with auth redirects
β βββ layout.tsx # Root layout with providers
β βββ globals.css # Tailwind CSS + custom styles
β βββ api/
β β βββ scrape-with-session/ # Authenticated web scraping
β β βββ extract-web/ # Web extraction with schema
β β βββ extract-document/ # Document extraction (Reducto)
β β βββ crawl/ # Multi-page crawling
β β βββ map/ # Site URL discovery
β β βββ agent/ # AI agent orchestration
β β βββ send-email/ # Email via Resend
β β βββ vault/ # Vault session management
β β βββ extension/ # Chrome extension API
β βββ dashboard/
β β βββ page.tsx # Main dashboard view
β β βββ tenants/ # Tenant management pages
β β βββ monitoring/ # Analytics and monitoring
β βββ auth/
β βββ callback/ # OAuth callback handler
βββ components/
β βββ ui/ # shadcn/ui components
β βββ dashboard/
β βββ workflow-builder.tsx # Visual workflow editor
β βββ vault-session-list.tsx
β βββ extraction-schema-list.tsx
β βββ api-key-manager.tsx
β βββ notifications-popover.tsx
β βββ ...
βββ lib/
β βββ supabase/
β β βββ server.ts # Server-side Supabase client
β β βββ client.ts # Client-side Supabase client
β β βββ middleware.ts # Auth middleware
β βββ encryption.ts # API key generation/hashing
β βββ cookie-parser.ts # Cookie expiration detection
β βββ utils.ts # Utility functions
βββ supabase/
β βββ config.toml # Supabase local config
β βββ seed.sql # Database seed data
β βββ migrations/ # SQL migrations
βββ extension/
β βββ manifest.json # Chrome extension manifest
β βββ popup.html # Extension popup UI
β βββ popup.js # Extension logic
βββ types.ts # Generated Supabase types
| Table | Description |
|---|---|
organizations |
SaaS companies using Crow |
organization_members |
User β Organization membership |
tenants |
Customers of the SaaS companies |
tenant_members |
User β Tenant membership |
tenant_invites |
Pending invitations |
| Table | Description |
|---|---|
vault_sessions |
Encrypted session cookies |
vault_error_logs |
Session error tracking |
vault_rate_limits |
Rate limit windows |
| Table | Description |
|---|---|
extraction_schemas |
Field definitions for extraction |
document_extractions |
Extracted document data |
extractions |
General extraction history |
| Table | Description |
|---|---|
scheduled_workflows |
Workflow definitions with scheduling |
workflow_runs |
Execution history and results |
workflows |
Legacy workflow definitions |
| Table | Description |
|---|---|
notifications |
In-app notifications (realtime) |
- Bun (v1.0+)
- Supabase CLI
- GitHub OAuth app (for authentication)
git clone https://github.com/your-org/crow.git
cd crow
bun installCopy the example environment file and fill in your values:
cp .env.example .env.localRequired environment variables:
# Supabase
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
# External Services
FIRECRAWL_API_KEY=your_firecrawl_key
REDUCTO_API_KEY=your_reducto_key
RESEND_API_KEY=your_resend_key
RESEND_FROM_EMAIL=notifications@yourdomain.comStart the local Supabase instance and apply migrations:
bunx supabase start
bunx supabase db pushGenerate TypeScript types:
bunx supabase gen types typescript --local > types.tsbun devOpen http://127.0.0.1:3000 in your browser.
| Command | Description |
|---|---|
bun dev |
Start development server |
bun build |
Build for production |
bun start |
Start production server |
bun lint |
Run Biome linter |
bun format |
Format code with Biome |
Create a new migration:
bunx supabase migration new migration_nameApply migrations:
bunx supabase db pushRegenerate types after schema changes:
bunx supabase gen types typescript --local > types.tsbunx shadcn@latest add component-nameAll API endpoints require authentication. For user requests, cookies handle session auth automatically. For programmatic access, use tenant API keys:
curl -X POST https://your-crow-instance/api/extract-web \
-H "Authorization: Bearer crow_tenant_xxxxx" \
-H "Content-Type: application/json" \
-d '{"schemaId": "...", "url": "..."}'Scrape a URL using a stored vault session for authentication.
{
"sessionId": "uuid",
"url": "https://example.com/protected-page",
"formats": ["markdown", "html"],
"onlyMainContent": true
}Extract structured data from a URL using a schema.
{
"schemaId": "uuid",
"url": "https://example.com/data",
"sessionId": "uuid (optional)"
}Extract data from PDFs or images using Reducto.
{
"schemaId": "uuid",
"documentUrl": "https://example.com/invoice.pdf",
"filename": "invoice.pdf"
}Crawl multiple pages starting from a URL.
{
"url": "https://example.com",
"maxDepth": 3,
"maxPages": 100,
"sessionId": "uuid (optional)"
}Discover all URLs on a website.
{
"url": "https://example.com",
"sessionId": "uuid (optional)"
}Send extraction results via email.
{
"to": "user@example.com",
"subject": "Extraction Results",
"data": { "field1": "value1" }
}The Crow Session Vault extension enables one-click capture of authenticated sessions.
- Open Chrome and navigate to
chrome://extensions - Enable "Developer mode"
- Click "Load unpacked" and select the
/extensionfolder
- Log into the target website
- Click the Crow extension icon
- Select a tenant from the dropdown
- Name your session
- Click "Capture Session"
The session is encrypted and stored in your tenant's vault, ready for use by AI agents.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Crow Dashboard β
β ββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββββββββ β
β β Organizations β Tenants β Workflows β β
β β & Members β & Vault Sessionsβ & Extractions β β
β ββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Next.js API Routes β
β ββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββ β
β β /scrape β /extract β /crawl β /agent β /webhook β β
β β -session β -documentβ β β β β
β ββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β Firecrawl β β Reducto β β Resend β
β Web Scraping β β Document OCR β β Email β
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Supabase (Postgres + RLS) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Organizations β Tenants β Vault Sessions β Extractions β β
β β Row Level Security ensures complete data isolation β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Row Level Security (RLS) - All tables protected by Postgres RLS policies
- Multi-tenant isolation - Users only see data they're authorized to access
- Organization boundaries - Tenants are scoped to organizations
- Encrypted at rest - Vault sessions are encrypted before storage
- Rate limiting - Configurable per-session rate limits
- Expiration tracking - Automatic alerts for expiring sessions
- Hashed API keys - Only the hash is stored; keys cannot be retrieved
- Scoped access - API keys are tenant-specific
- Audit logging - All operations are logged for compliance
Use it however i dont care, I dont take responsibilty for anything you or this software causes though.