Skip to content

meetpandya27/plaidify


Plaidify

The missing infrastructure between AI agents and the authenticated web.
Give any app or agent a REST API to log in, read data, and take actions on any website —
banks, utilities, portals, government sites — without writing a single scraper.

Try the Demo  ·  Quickstart  ·  Agent Integration  ·  API Docs  ·  Roadmap  ·  Report a Bug

CI Stars MIT License Python 3.9+ MCP PRs Welcome


The Problem

Bank balances, utility bills, insurance claims, medical records, academic transcripts, government portals — the most useful data on the web sits behind login forms with no APIs.

Services like Plaid cover banking and charge $500+/month. Everything else? You’re writing brittle Selenium scripts or paying per-connection fees to closed-source vendors.

Plaidify is an attempt to fix this: one JSON blueprint per site, one REST API for everything.


How It Works

┌──────────────────┐        ┌──────────────────────────────────────────────┐
│                  │        │               Plaidify                        │
│   Your App /     │  POST  │                                              │
│   AI Agent /     ├───────►│  1. Rate Limit & Auth (JWT + refresh tokens) │
│   MCP Client     │        │  2. Decrypt credentials (RSA → AES-256-GCM)  │
│                  │◄───────┤  3. Load Blueprint (JSON or Python)           │
│                  │  JSON  │  4. Launch Browser (Playwright)               │
│                  │        │  5. Authenticate & Extract Data               │
│                  │        │  6. Return Structured Response                │
└──────────────────┘        └──────────────────────────────────────────────┘

Security layers:  Rate limiting ─► CORS ─► JWT auth ─► RSA decrypt
                  ─► Per-user DEK envelope encryption ─► Key rotation

One call. Structured JSON out. Any website. Enterprise-grade security.

curl -X POST http://localhost:8000/connect \
  -H "Content-Type: application/json" \
  -d '{"site": "greengrid_energy", "username": "demo_user", "password": "demo_pass"}'
{
  "status": "connected",
  "data": {
    "current_bill": "$142.57",
    "usage_kwh": "1,247 kWh",
    "account_status": "Active",
    "service_address": "742 Evergreen Terrace, Springfield, IL 62704",
    "plan_name": "Green Choice 100",
    "usage_history": [
      { "month": "March 2026", "kwh": "1,247", "cost": "$142.57" },
      { "month": "February 2026", "kwh": "1,389", "cost": "$158.83" }
    ]
  }
}

Why Not Just Use Plaid?

Plaid Plaidify
Cost $500+/mo, per-connection fees Free forever (MIT)
Coverage Banks & financial only Any website with a login form
Self-hosted No Yes — your infra, your data
AI Agent Ready Not designed for agents MCP server, agent SDK, consent model (Phase 3)
Open Source No Yes — audit, extend, contribute
Custom Sites Wait for Plaid to support it Write a JSON blueprint in 5 minutes
Data Residency Their servers Your servers, your country

🤖 Built for the AI Agent Era

Plaidify isn't just another Plaid alternative. It's infrastructure for the next generation of AI agents that need to interact with the authenticated web.

For AI Agent Builders

# Coming in Phase 3 — MCP Server
# Your agent connects to any site
# through a standardized protocol

# Claude, GPT, or any MCP client:
# "What's my electricity bill this month?"
# → Plaidify logs into GreenGrid Energy
# → Returns $142.57 bill + 1,247 kWh usage
# → Agent summarizes and responds

Why agents need this:

  • Structured data from any authenticated site
  • User consent & scoped permissions
  • Credential encryption at rest (AES-256-GCM)
  • Full audit trail per agent action
  • Built-in rate limiting & error recovery

For App Developers

# Today — works right now
import requests

# Connect to any site with a blueprint
resp = requests.post(
    "http://localhost:8000/connect",
    json={
        "site": "greengrid_energy",
        "username": "demo_user",
        "password": "demo_pass"
    }
)
print(resp.json()["data"])
# → {"current_bill": "$142.57", "usage_kwh": "1,247 kWh", ...}

Why devs love this:

  • Drop a JSON blueprint → get an API
  • No Selenium/Playwright code to write
  • Credential encryption handled for you (AES-256-GCM)
  • Swagger docs at /docs out of the box
  • Docker-ready, CI included

📖 Full agent integration guide → docs/AGENTS.md


🎮 Try the Demo

See Plaidify in action with our built-in GreenGrid Energy demo — a fully functional utility company portal that showcases the complete extraction pipeline.

git clone https://github.com/meetpandya27/plaidify.git && cd plaidify
pip install -r requirements.txt
python run_demo.py
# → Open http://localhost:8000/ui/demo.html

The demo launches two servers:

  • GreenGrid Energy portal (port 8080) — a realistic utility company site with login, dashboard, billing, and account pages
  • Plaidify API (port 8000) — the extraction engine with an interactive demo UI

Demo credentials:

Username Password Flow
demo_user demo_pass Standard login → full data extraction
mfa_user mfa_pass MFA challenge (code: 123456) → data extraction

What gets extracted: Account info, current bill, energy usage (kWh), 6 months of usage history, payment records, service address, meter ID, plan details, and customer profile — all from a single API call.

13 fields MFA Zero config


⚡ 30-Second Quickstart

Option A: Docker (recommended)

git clone https://github.com/meetpandya27/plaidify.git && cd plaidify
cp .env.example .env     # Edit and set ENCRYPTION_KEY + JWT_SECRET_KEY
docker compose up --build
# → API live at http://localhost:8000
# → Swagger docs at http://localhost:8000/docs

Option B: Local

git clone https://github.com/meetpandya27/plaidify.git && cd plaidify
pip install -r requirements.txt
cp .env.example .env     # Edit and set ENCRYPTION_KEY + JWT_SECRET_KEY
alembic upgrade head
uvicorn src.main:app --reload

Try it

# Quickest way — run the interactive demo
python run_demo.py
# → Open http://localhost:8000/ui/demo.html

# Or use the API directly
curl -s http://localhost:8000/connect \
  -H "Content-Type: application/json" \
  -d '{"site": "greengrid_energy", "username": "demo_user", "password": "demo_pass"}' | jq

# Full Plaid-style link flow
TOKEN=$(curl -s -X POST http://localhost:8000/auth/register \
  -H "Content-Type: application/json" \
  -d '{"username":"dev","email":"dev@test.com","password":"securepass123"}' \
  | jq -r '.access_token')

curl -s -X POST "http://localhost:8000/create_link?site=demo_site" \
  -H "Authorization: Bearer $TOKEN" | jq

🧩 Blueprints — The Core Idea

A blueprint is a tiny JSON file that teaches Plaidify how to log into a specific website. No code required.

{
  "name": "My Bank",
  "login_url": "https://mybank.com/login",
  "fields": {
    "username": "#email-input",
    "password": "#password-input",
    "submit": "#login-button"
  },
  "post_login": [
    { "wait": "#dashboard-loaded" },
    {
      "extract": {
        "balance": "#account-balance",
        "last_transaction": "#recent-activity .first"
      }
    }
  ]
}

Drop it in /connectors/ → restart → call the API. That's it.

Need custom logic? Use a Python connector instead:

from src.core.connector_base import BaseConnector

class MyBankConnector(BaseConnector):
    def connect(self, username: str, password: str) -> dict:
        # Custom Playwright logic, API calls, anything
        return {"status": "connected", "data": {"balance": 4521.30}}

Want to contribute a blueprint? That's the #1 way to help. See CONTRIBUTING.md.


📖 API Reference

Core Endpoints

Endpoint Method Auth Description
/connect POST One-step connect & extract data
/disconnect POST End a session
/health GET System health + DB status

Link Token Flow (Plaid-style)

Endpoint Method Auth Description
/create_link POST JWT Create a link token for a site
/submit_credentials POST JWT Submit credentials (encrypted at rest)
/submit_instructions POST JWT Attach processing instructions
/fetch_data GET JWT Fetch extracted data
/links GET JWT List your links
/links/{token} DELETE JWT Delete a link
/tokens GET JWT List your access tokens
/tokens/{token} DELETE JWT Delete an access token

Auth

Endpoint Method Description
/auth/register POST Create account
/auth/token POST Login → JWT (access + refresh tokens)
/auth/refresh POST Exchange refresh token for new token pair
/auth/me GET Your profile
/auth/oauth2 POST OAuth2 login (placeholder)
/encryption/session POST Create ephemeral RSA keypair for credential encryption

Interactive Swagger docs: http://localhost:8000/docs


🔐 Security Model

We treat credential handling as the #1 priority.

Practice Status
AES-256-GCM encryption at rest
Envelope encryption — per-user Data Encryption Keys (DEKs)
Encryption key rotation with versioning
Client-side RSA-2048 credential encryption (ephemeral keys)
No hardcoded secrets — app fails to start without env vars
JWT auth with 15-min access tokens + refresh token rotation
Rate limiting on auth & connect endpoints (slowapi)
CORS enforcement — no wildcard in production
Security headers (X-Content-Type-Options, X-Frame-Options, CSP, HSTS)
User data isolation (tested & verified)
Non-root Docker container
Dependency auditing in CI (pip-audit)
Input validation (Pydantic, password min length)
Credential vaulting (HashiCorp Vault) 🔜 Phase 3
SOC 2 compliance 🔜 Phase 5

📊 Current Status — Honest & Transparent

We believe in building in public. Here's exactly what works and what doesn't.

✅ Production-Ready

Component What it does
REST API FastAPI with 19 endpoints, full Swagger docs
Auth System Register, login, JWT tokens, OAuth2 placeholder
Link Token Flow Plaid-style multi-step: create_link → submit_credentials → fetch_data
Credential Encryption AES-256-GCM authenticated encryption, no plaintext storage
Blueprint System V2 JSON blueprints with typed extraction, list extractors, cleanup steps
Browser Engine Real Playwright automation — headless Chromium, browser pooling, step execution
MFA Handling Async event-based MFA manager — detects challenges, pauses for user input
Data Extraction Typed field extraction (text, currency, date, sensitive), list/table extraction
Database SQLAlchemy ORM, Alembic migrations, SQLite/PostgreSQL
Security AES-256-GCM, envelope encryption (per-user DEKs), RSA client encryption, key rotation
Configuration Pydantic Settings, env vars, fails fast if misconfigured
CI/CD GitHub Actions: lint, test (3.9–3.12 matrix), security audit, Docker build
Test Suite 136+ tests across 13 suites, covering engine, blueprints, MFA, API, auth, security
Docker Multi-stage build, non-root user, health check
Interactive Demo GreenGrid Energy utility portal + dark-themed demo UI

🚧 In Progress

Component What's Missing Help Wanted?
Real-World Blueprints Only demo blueprints exist — need community-contributed blueprints for real sites 🔥 Yes
Python SDK + CLI ✅ Shipped — pip install plaidify, CLI with plaidify connect, plaidify blueprint, plaidify demo
Security Hardening ✅ Complete — rate limiting, CORS, headers, JWT refresh, RSA encryption, envelope encryption, key rotation
Plaidify Link UI Embeddable drop-in widget (like Plaid Link) Yes
Blueprint Registry Searchable catalog of community blueprints Yes

🗺️ Planned

Phase Focus Timeline
0 Foundation hardening, security, CI/CD, tests Complete
1 Real browser engine (Playwright), MFA, blueprints, demo Complete
2 ✅ Python SDK + CLI, security hardening (7 issues closed), Plaidify Link UI Weeks 1-3 (Mar 15 – Apr 4)
3 MCP server, AI agent SDK, consent engine, audit trails Weeks 3-5 (Mar 31 – Apr 18)
4 Write operations — pay bills, fill forms, action framework Weeks 5-7 (Apr 14 – May 2)
5 Enterprise — multi-tenant, K8s, SSO, admin console, v1.0 🚀 Weeks 7-10 (Apr 28 – May 23)

📋 Full 10-week execution plan → docs/PRODUCT_PLAN.md


🏗️ Architecture

plaidify/
├── src/
│   ├── main.py              # FastAPI app — all endpoints, auth, security middleware
│   ├── config.py            # Pydantic Settings — env var config
│   ├── database.py          # SQLAlchemy + AES-256-GCM + envelope encryption + key rotation
│   ├── models.py            # Request/response Pydantic schemas
│   ├── exceptions.py        # Custom error hierarchy (15 types)
│   ├── logging_config.py    # JSON (prod) / colored text (dev) logging
│   ├── crypto.py            # Ephemeral RSA-2048 keypair management
│   └── core/
│       ├── engine.py        # Playwright browser engine + blueprint executor
│       └── connector_base.py # Base class for Python connectors
├── sdk/                     # Python SDK + CLI (`pip install plaidify`)
│   └── plaidify/
│       ├── client.py        # Async/sync clients with auto-encryption
│       └── cli.py           # CLI: connect, blueprint, serve, demo, rotate-key
├── connectors/              # Drop JSON blueprints here
│   ├── greengrid_energy.json # GreenGrid Energy demo blueprint
│   └── test_bank.json       # Legacy test blueprint
├── example_site/            # GreenGrid Energy fake utility portal
│   └── server.py            # FastAPI app simulating a utility company
├── frontend/                # Demo UI assets
│   ├── demo.html            # Interactive demo widget
│   ├── demo.css             # Dark theme styles
│   └── demo.js              # Client-side connection flow logic
├── alembic/                 # Database migrations
├── tests/                   # 136+ tests across 13 suites
├── run_demo.py              # One-command demo launcher
├── .github/workflows/       # CI: lint → test → audit → docker
├── Dockerfile               # Multi-stage, non-root
├── docker-compose.yml       # One-command dev environment
└── .env.example             # All config documented here

📖 Full technical docs → docs/README.md


🤝 Contributing

We’re building open-source infrastructure for authenticated web data. Contributions welcome — especially blueprints for real sites.

Highest-Impact Contributions Right Now

Priority Task Difficulty
🔥 Write real-world blueprints — pick a public site, write the JSON Easy
🔥 Build the blueprint registry CLI — search, validate, share blueprints Medium
🟡 Build Python/JS SDKs — client libraries for easier integration Medium
🟡 Add push notification MFA — extend MFA beyond OTP codes Medium
🟢 Add unit tests — edge cases, error paths Easy
🟢 Improve error messages — make failures actionable Easy
# Get started in 60 seconds
git clone https://github.com/YOUR_USERNAME/plaidify.git && cd plaidify
pip install -r requirements.txt
cp .env.example .env               # Set ENCRYPTION_KEY + JWT_SECRET_KEY
alembic upgrade head && pytest -v  # All 136+ tests should pass

📋 Full contributor guide → CONTRIBUTING.md


🌍 Use Cases

💰 Personal Finance App — Aggregate bank data without Plaid

Write blueprints for each bank your users need. Plaidify handles login, session management, and data extraction. You get structured JSON with balances, transactions, and account details.

🤖 AI Financial Assistant — Let your agent check bank balances

Your agent calls the Plaidify API to securely access the user's bank portal, extract current balances, and answer questions like "Can I afford this purchase?" — with full audit trails and user consent.

⚡ Utility Bill Tracker — Monitor bills across providers

Create blueprints for utility company portals. Schedule periodic data fetches. Get structured billing data without waiting for each company to build an API.

🏥 Insurance & Healthcare Aggregator — Unified patient/policyholder portal

Access insurance claims, EOBs, and coverage details from provider portals. Self-hosted means full data residency compliance.

🎓 Student Data Platform — Transcripts, grades, financial aid

Build integrations with university portals. Pull transcripts, grades, and financial aid information through a unified API.

🏢 Enterprise Data Aggregation — Internal tool integration

Connect to internal portals, vendor dashboards, and legacy systems that lack APIs. Self-host with compliance controls and SSO.


Comparison with Other Tools

Tool Type Websites Supported AI Agent Ready Self-Hosted Cost
Plaidify Infrastructure layer Any login-protected site ✅ (Phase 3) Free
Plaid Managed service Banks & financial only $500+/mo
Woob Python scrapers ~80 French/EU sites Free
Selenium/Playwright Raw tools Any (you write everything) Free
Huginn Ruby agents Any (complex setup) Free

Plaidify's sweet spot: The abstraction of Plaid + the flexibility of Playwright + the openness of Woob, designed with AI agents in mind.


Star History

If Plaidify is useful, a ⭐ helps others find it.

Star History Chart


⚠️ Legal Disclaimer

Plaidify is a general-purpose browser automation infrastructure tool. It is your responsibility to ensure that your use of Plaidify complies with the Terms of Service of any website you interact with, as well as all applicable local, state, and federal laws.

  • Many websites prohibit automated access in their Terms of Service. Using Plaidify with such sites may violate those terms and could result in account suspension or legal action.
  • Plaidify is not a licensed financial data aggregator. If you use it to access banking or financial sites, you do so at your own risk. Your financial institution may not cover losses related to credentials shared with third-party tools.
  • The authors and contributors of Plaidify accept no liability for misuse, data loss, account lockouts, or any other damages arising from use of this software.
  • Always obtain explicit user consent before accessing any account on their behalf.

tl;dr — This is a power tool. Use it responsibly, read the TOS of target sites, and don't do anything you wouldn't want done to your own accounts.


📄 License

MIT — use it in personal projects, startups, or enterprise. No restrictions.


Built by @meetpandya27 and contributors
The open-source gateway between AI agents and the authenticated web.

⭐ Star  ·  🍴 Fork  ·  🐛 Issues  ·  🤖 Agent Docs

About

Open-source API for authenticated web data extraction. Like Plaid, but for any website. Browser automation + JSON blueprints + MFA support. Try the GreenGrid Energy demo!

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages