Skip to content

EnesAkyuz/crow

Repository files navigation

πŸͺΆ Crow

Multi-Tenant Data Ingestion Infrastructure for AI Agents

Crow is a B2B infrastructure platform that enables SaaS companies to build secure, multi-tenant data pipelines for their AI agents. It solves the critical problem of authenticated web scraping and document extraction by providing a secure session vault, visual workflow builder, and enterprise-grade access controls.

Every SaaS is building agents, but those agents die at the login screen or get confused by a PDF table. Crow gives them the infrastructure to handle their customers' messy web and document data.


✨ Features

πŸ” Session Vault

Store and manage encrypted authentication sessions for your tenants. Enable AI agents to scrape behind login walls without exposing credentials.

  • Encrypted cookie storage - Sessions are encrypted at rest
  • Expiration tracking - Auto-detect cookie expiration and receive alerts
  • Rate limiting - Per-session hourly/daily limits to avoid detection
  • Browser extension - One-click session capture from any website
Screenshot 2026-02-01 at 12 56 54β€―PM

πŸ”„ Visual Workflow Builder

Build complex data extraction pipelines without code using a step-based workflow designer.

  • Web Scrape - Extract structured data from a single page
  • Web Crawl - Crawl multiple pages from a starting URL
  • Site Map - Discover all URLs on a website
  • AI Agent (FIRE-1) - Multi-page agentic extraction for complex flows
  • Document Extract - Extract from PDFs and images using Reducto
  • Webhook - POST extracted data to any endpoint
  • Email - Send results via email using Resend
Screenshot 2026-02-01 at 12 57 13β€―PM

πŸ“Š Extraction Schemas

Define reusable extraction schemas with typed fields:

  • String, Number, Date, Boolean field types
  • Schema versioning and activation controls
  • Shared across workflows for consistency
Screenshot 2026-02-01 at 12 57 02β€―PM

🏒 Multi-Tenant Architecture

Enterprise-ready multi-tenancy with granular access controls:

  • Organizations - Top-level grouping for SaaS companies
  • Tenants - Individual customers of your SaaS
  • Members - Invite team members with role-based permissions
  • Row Level Security - Postgres RLS ensures data isolation
Screenshot 2026-02-01 at 12 57 20β€―PM

πŸ”‘ API Access

Programmatic access for your AI agents:

  • Per-tenant API keys for secure agent authentication
  • RESTful endpoints for all extraction operations
  • Webhook integrations for real-time data delivery

πŸ“± Chrome Extension

Capture authenticated sessions with one click:

  • Automatically detects cookies on any site
  • Push sessions directly to your tenant's vault
  • Manage existing sessions from the extension

πŸ› οΈ Tech Stack

Layer Technology
Runtime Bun
Framework Next.js 16 (App Router)
Database Supabase (Postgres + RLS)
Auth Supabase Auth (GitHub OAuth)
UI shadcn/ui + Tailwind CSS v4
Forms React Hook Form + Zod validation
Web Scraping Firecrawl
Document OCR Reducto
Email Resend
Linting Biome

πŸ“ Project Structure

β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ actions.ts              # Server actions for mutations
β”‚   β”œβ”€β”€ page.tsx                # Landing page with auth redirects
β”‚   β”œβ”€β”€ layout.tsx              # Root layout with providers
β”‚   β”œβ”€β”€ globals.css             # Tailwind CSS + custom styles
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ scrape-with-session/  # Authenticated web scraping
β”‚   β”‚   β”œβ”€β”€ extract-web/          # Web extraction with schema
β”‚   β”‚   β”œβ”€β”€ extract-document/     # Document extraction (Reducto)
β”‚   β”‚   β”œβ”€β”€ crawl/                # Multi-page crawling
β”‚   β”‚   β”œβ”€β”€ map/                  # Site URL discovery
β”‚   β”‚   β”œβ”€β”€ agent/                # AI agent orchestration
β”‚   β”‚   β”œβ”€β”€ send-email/           # Email via Resend
β”‚   β”‚   β”œβ”€β”€ vault/                # Vault session management
β”‚   β”‚   └── extension/            # Chrome extension API
β”‚   β”œβ”€β”€ dashboard/
β”‚   β”‚   β”œβ”€β”€ page.tsx              # Main dashboard view
β”‚   β”‚   β”œβ”€β”€ tenants/              # Tenant management pages
β”‚   β”‚   └── monitoring/           # Analytics and monitoring
β”‚   └── auth/
β”‚       └── callback/             # OAuth callback handler
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ ui/                       # shadcn/ui components
β”‚   └── dashboard/
β”‚       β”œβ”€β”€ workflow-builder.tsx  # Visual workflow editor
β”‚       β”œβ”€β”€ vault-session-list.tsx
β”‚       β”œβ”€β”€ extraction-schema-list.tsx
β”‚       β”œβ”€β”€ api-key-manager.tsx
β”‚       β”œβ”€β”€ notifications-popover.tsx
β”‚       └── ...
β”œβ”€β”€ lib/
β”‚   β”œβ”€β”€ supabase/
β”‚   β”‚   β”œβ”€β”€ server.ts             # Server-side Supabase client
β”‚   β”‚   β”œβ”€β”€ client.ts             # Client-side Supabase client
β”‚   β”‚   └── middleware.ts         # Auth middleware
β”‚   β”œβ”€β”€ encryption.ts             # API key generation/hashing
β”‚   β”œβ”€β”€ cookie-parser.ts          # Cookie expiration detection
β”‚   └── utils.ts                  # Utility functions
β”œβ”€β”€ supabase/
β”‚   β”œβ”€β”€ config.toml               # Supabase local config
β”‚   β”œβ”€β”€ seed.sql                  # Database seed data
β”‚   └── migrations/               # SQL migrations
β”œβ”€β”€ extension/
β”‚   β”œβ”€β”€ manifest.json             # Chrome extension manifest
β”‚   β”œβ”€β”€ popup.html                # Extension popup UI
β”‚   └── popup.js                  # Extension logic
└── types.ts                      # Generated Supabase types

πŸ—„οΈ Database Schema

Core Tables

Table Description
organizations SaaS companies using Crow
organization_members User ↔ Organization membership
tenants Customers of the SaaS companies
tenant_members User ↔ Tenant membership
tenant_invites Pending invitations

Vault & Sessions

Table Description
vault_sessions Encrypted session cookies
vault_error_logs Session error tracking
vault_rate_limits Rate limit windows

Extraction

Table Description
extraction_schemas Field definitions for extraction
document_extractions Extracted document data
extractions General extraction history

Workflows

Table Description
scheduled_workflows Workflow definitions with scheduling
workflow_runs Execution history and results
workflows Legacy workflow definitions

Notifications

Table Description
notifications In-app notifications (realtime)

πŸš€ Getting Started

Prerequisites

1. Clone & Install

git clone https://github.com/your-org/crow.git
cd crow
bun install

2. Environment Setup

Copy the example environment file and fill in your values:

cp .env.example .env.local

Required environment variables:

# Supabase
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key

# External Services
FIRECRAWL_API_KEY=your_firecrawl_key
REDUCTO_API_KEY=your_reducto_key
RESEND_API_KEY=your_resend_key
RESEND_FROM_EMAIL=notifications@yourdomain.com

3. Database Setup

Start the local Supabase instance and apply migrations:

bunx supabase start
bunx supabase db push

Generate TypeScript types:

bunx supabase gen types typescript --local > types.ts

4. Run Development Server

bun dev

Open http://127.0.0.1:3000 in your browser.


πŸ”§ Development

Commands

Command Description
bun dev Start development server
bun build Build for production
bun start Start production server
bun lint Run Biome linter
bun format Format code with Biome

Database Migrations

Create a new migration:

bunx supabase migration new migration_name

Apply migrations:

bunx supabase db push

Regenerate types after schema changes:

bunx supabase gen types typescript --local > types.ts

Adding UI Components

bunx shadcn@latest add component-name

πŸ”Œ API Reference

Authentication

All API endpoints require authentication. For user requests, cookies handle session auth automatically. For programmatic access, use tenant API keys:

curl -X POST https://your-crow-instance/api/extract-web \
  -H "Authorization: Bearer crow_tenant_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{"schemaId": "...", "url": "..."}'

Endpoints

POST /api/scrape-with-session

Scrape a URL using a stored vault session for authentication.

{
  "sessionId": "uuid",
  "url": "https://example.com/protected-page",
  "formats": ["markdown", "html"],
  "onlyMainContent": true
}

POST /api/extract-web

Extract structured data from a URL using a schema.

{
  "schemaId": "uuid",
  "url": "https://example.com/data",
  "sessionId": "uuid (optional)"
}

POST /api/extract-document

Extract data from PDFs or images using Reducto.

{
  "schemaId": "uuid",
  "documentUrl": "https://example.com/invoice.pdf",
  "filename": "invoice.pdf"
}

POST /api/crawl

Crawl multiple pages starting from a URL.

{
  "url": "https://example.com",
  "maxDepth": 3,
  "maxPages": 100,
  "sessionId": "uuid (optional)"
}

POST /api/map

Discover all URLs on a website.

{
  "url": "https://example.com",
  "sessionId": "uuid (optional)"
}

POST /api/send-email

Send extraction results via email.

{
  "to": "user@example.com",
  "subject": "Extraction Results",
  "data": { "field1": "value1" }
}

🧩 Chrome Extension

The Crow Session Vault extension enables one-click capture of authenticated sessions.

Installation

  1. Open Chrome and navigate to chrome://extensions
  2. Enable "Developer mode"
  3. Click "Load unpacked" and select the /extension folder

Usage

  1. Log into the target website
  2. Click the Crow extension icon
  3. Select a tenant from the dropdown
  4. Name your session
  5. Click "Capture Session"

The session is encrypted and stored in your tenant's vault, ready for use by AI agents.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Crow Dashboard                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Organizations   β”‚     Tenants      β”‚      Workflows       β”‚ β”‚
β”‚  β”‚  & Members       β”‚  & Vault Sessionsβ”‚   & Extractions      β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Next.js API Routes                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ /scrape  β”‚ /extract β”‚ /crawl   β”‚  /agent  β”‚   /webhook   β”‚  β”‚
β”‚  β”‚ -session β”‚ -documentβ”‚          β”‚          β”‚              β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                     β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Firecrawl   β”‚    β”‚    Reducto    β”‚    β”‚    Resend     β”‚
β”‚  Web Scraping β”‚    β”‚  Document OCR β”‚    β”‚    Email      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Supabase (Postgres + RLS)                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Organizations β”‚ Tenants β”‚ Vault Sessions β”‚ Extractions  β”‚  β”‚
β”‚  β”‚  Row Level Security ensures complete data isolation      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”’ Security

Data Isolation

  • Row Level Security (RLS) - All tables protected by Postgres RLS policies
  • Multi-tenant isolation - Users only see data they're authorized to access
  • Organization boundaries - Tenants are scoped to organizations

Session Security

  • Encrypted at rest - Vault sessions are encrypted before storage
  • Rate limiting - Configurable per-session rate limits
  • Expiration tracking - Automatic alerts for expiring sessions

API Security

  • Hashed API keys - Only the hash is stored; keys cannot be retrieved
  • Scoped access - API keys are tenant-specific
  • Audit logging - All operations are logged for compliance

πŸ“„ License

Use it however i dont care, I dont take responsibilty for anything you or this software causes though.

About

YC Stackathon submission

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors