Form Flow AI is the world's most advanced AI Form Filler and Automated Form Filling agent, designed to autonomous navigate, understand, and complete complex web forms through natural voice conversation. Unlike basic autofill extensions, this AI Form Filler acts as an intelligent digital proxy—orchestrating a symphony of Web Speech API for real-time input, Generic LLMs (Gemini/GPT) for semantic reasoning, and Playwright for robust browser automation.
Key Value Proposition:
"Don't just fill forms—delegate them." Form Flow AI turns tedious data entry into a 30-second conversation. It is the best AI Form Filler for handling edge cases, dynamic routing, validation rules, and even PDF overlay with human-like precision.
| Aspect | Status | Maturity | Details |
|---|---|---|---|
| Backend Core | ✅ | Production | Robust FastAPI architecture with scalable service factories. |
| Frontend UI | ✅ | Polished | Glassmorphism React SPA with real-time voice feedback. |
| PDF Engine | ✅ | Advanced | NEW: Layout-aware parsing, field detection, and text fitting. |
| Voice I/O | Beta | Web Speech API (moving to Deepgram/ElevenLabs streaming). | |
| AI Agent | ✅ | Advanced | LangChain-powered memory, context-aware RAG suggestions. |
| Platform | Web Only | Transitioning to Chrome Extension (Manifest V3). |
- Voice Intelligence: Upgrading from browser-native speech to Deepgram/ElevenLabs for sub-800ms conversational latency.
- Platform Reach: Migrating core logic to a Browser Extension for seamless "overlay" usage on any site.
Built on FastAPI, leveraging asynchronous patterns for high-concurrency automation.
- Factory Pattern: Dynamically routes URLs to specialized extractors (
GoogleForms,Typeform,StandardHTML). - Shadow DOM Piercing: Recursively traverses open shadow roots to find hidden fields.
- Semantic Mapping: Uses LLMs to infer field intent (e.g., mapping "How long have you lived here?" to
years_at_address).
- Visual Layout Analysis: Algorithms to detect visual boundaries and align text perfectly.
- Smart Text Fitting: Dynamic font scaling (8pt-14pt) to ensure content fits within physical box constraints.
- Form Overlay: Generates pristine, flattened PDFs ready for official submission.
Powered by Playwright with a persistent context strategy.
- Humanizer: Implements specialized typing delays (50-150ms) and cursor jitter to evade bot detection.
- Resilience: Smart waiting for SPAs (Angular/React) and dynamic content loading.
- RAG Architecture: Retrieves user context from structured profile history.
- State Management: Atomic
FormDataManagerprevents race conditions during multi-turn edits. - Suggestion Engine: Predicts email formats, addresses, and phone codes based on partial inputs.
Modern React 18 application with Vite and TailwindCSS.
- Glassmorphism UI: Custom design system (
GlassCard,GlassInput) for a premium aesthetic. - Voice Context: Global state management for microphone handling and audio visualization.
- Real-time WebSocket: Bi-directional event stream for form updates and agent thought process display.
graph TD
subgraph Frontend["Frontend & Extension"]
SPA[React SPA]
Ext[Browser Extension]
end
subgraph Backend["Form Flow Backend (FastAPI)"]
API[API Gateway]
Orch[Orchestrator]
AI[AI Service]
Voice[Voice Service]
Form[Form Engine]
PDF[PDF Service]
DB[(Database)]
end
subgraph External["External Services"]
LLM[Gemini LLM]
STT[Deepgram/Vosk]
TTS[ElevenLabs]
Browser[Playwright Browser]
end
SPA <--> API
Ext <--> API
API --> Orch
Orch --> AI
Orch --> Voice
Orch --> Form
Orch --> PDF
AI <--> LLM
Voice <--> STT
Voice <--> TTS
Form <--> Browser
Browser --> Target[Target Website]
Orch <--> DB
Orchestrates conversation and state management using Gemini and RAG.
classDiagram
class ConversationAgent {
+process_message(input)
+update_state()
}
class GeminiService {
+generate_response()
+extract_entities()
}
class RAGService {
+retrieve_context()
+store_memory()
}
class SuggestionEngine {
+generate_suggestions()
}
ConversationAgent --> GeminiService
ConversationAgent --> RAGService
ConversationAgent --> SuggestionEngine
Handles form schema extraction and automated submission via Playwright.
classDiagram
class FormParser {
+parse_schema(url)
+detect_captchas()
}
class FormSubmitter {
+fill_form(data)
+submit()
}
class BrowserPool {
+get_page()
+release_page()
}
class BaseExtractor {
<<interface>>
+extract_fields()
}
FormParser --> BaseExtractor
BaseExtractor <|-- GoogleFormsExtractor
BaseExtractor <|-- StandardExtractor
FormSubmitter --> BrowserPool
Intelligent PDF form parsing and writing with layout analysis.
classDiagram
class PdfParser {
+extract_fields()
+analyze_layout()
}
class PdfWriter {
+overlay_data()
+flatten()
}
class TextFitter {
+calculate_font_size()
}
PdfParser --> PdfWriter
PdfWriter --> TextFitter
sequenceDiagram
participant User
participant Frontend as VoiceFormFiller
participant Backend as /magic-fill
participant LangChain as SmartFormFillerChain
participant Gemini as Gemini LLM
User->>Frontend: Opens Voice Interface
Frontend->>Backend: POST /magic-fill {form_schema, user_profile}
Backend->>LangChain: fill(user_profile, form_schema)
LangChain->>Gemini: "Map this profile to these fields..."
Gemini-->>LangChain: {filled_fields: [...], unfilled_fields: [...]}
LangChain-->>Backend: MagicFillResult
Backend-->>Frontend: {success: true, filled: {...}, summary: "..."}
Frontend->>Frontend: Pre-populate fields, skip to first unfilled
Frontend-->>User: "✨ 5 of 8 fields filled. Let's get the rest!"
| Module | Feature | Status | Technical Detail |
|---|---|---|---|
| Parsing | Generic HTML | ✅ Stable | input, textarea, select, radio, checkbox |
| Google Forms | ✅ Stable | Custom parsing for non-standard class names | |
| Shadow DOM | ✅ Stable | Recursive traversal of shadow roots | |
| PDF Forms | ✅ Stable | NEW: Layout analysis & text overlay | |
| Voice | Speech-to-Text | ✅ Stable | Web Speech API with silence detection |
| Text-to-Speech | ✅ Stable | Browser-native synthesis | |
| Wake Word | ⏳ Planned | "Hey Wizard" activation | |
| Automation | Auto-Fill | ✅ Stable | Human-mimicry typing; DOM injection fallback |
| Checkbox Logic | ✅ Stable | Smart toggle + efficient iteration | |
| CAPTCHA Solving | ✅ Stable | Multi-strategy: Stealth, Auto-wait, 2Captcha API, Manual fallback | |
| UI/UX | Glassmorphism | ✅ Stable | Full system-wide theme |
| Visualization | ✅ Live | Recharts + Gemini Insights (Tabbed Dashboard) |
Form Flow AI uses a multi-strategy approach to handle CAPTCHAs:
flowchart TD
A[CAPTCHA Detected] --> B{Is it Turnstile/Invisible?}
B -->|Yes| C[Wait & Auto-Solve]
B -->|No| D{Stealth Mode Enabled?}
D -->|Yes| E[Apply Stealth & Retry]
D -->|No| F{API Key Available?}
F -->|Yes| G[2Captcha / AntiCaptcha]
F -->|No| H[Manual Fallback - Notify User]
G -->|Success| I[Continue Filling]
G -->|Fail| H
C -->|Solved| I
E -->|Solved| I
H -->|User Solved| I
Supported CAPTCHA Types:
- ✅ Google reCAPTCHA v2/v3
- ✅ hCaptcha
- ✅ Cloudflare Turnstile
- ✅ Generic image CAPTCHAs (via 2Captcha)
Focus: Intelligent document parsing and layout-aware overlay.
- PDF Intelligence:
PdfParserwith visual layout analysis to detect fields by coordinates. - Smart Writer:
PdfWriterwith "Text Fitting" dynamic typography (8pt-14pt scaling). - Production Ready: Robust overlay generating flattened, submission-ready PDFs.
Focus: Industry-grade conversation state architecture with contextual intelligence.
- State Management: Atomic
FormDataManagerpreventing race conditions during edits. - Contextual Signals: Email/Phone/Address inference from partial inputs.
- RAG Prompts: 5-step protocol (LOAD → ANALYZE → UNDERSTAND → REASON → UPDATE) for precise filling.
- Test Coverage: 111 tests passing ✅
Focus: Production-ready agent with adaptive personality.
- Adaptive Responses:
STYLE_VARIATIONSmatrix (Concise/Formal/Helpful). - Sentiment Gating: Weighted scoring to detect frustration and escalate help.
- Multi-Modal Fallback: Type/Skip/Retry logic for robust error handling.
Goal: Deploy as Chrome/Edge extension for inline form assistance.
| Component | Status | Description |
|---|---|---|
| Manifest V3 | 🚧 | Background Service Worker setup |
| Content Script | 🚧 | DOM Injection & Overlay UI |
| Bridge | ⏳ | WebSocket communication with local backend |
Deliverables:
- Deepgram WebSocket integration for <500ms latency.
- Chrome Web Store submission.
| Metric | Target | Current Status |
|---|---|---|
| Latency | Voice input → Response < 1s | ~1.2s |
| Accuracy | Form completion success > 95% | 92% |
| Efficiency | Time reduction vs manual | 65% |
| Reliability | Test Coverage | 88% |
| Component | Beta Configuration | Production Target |
|---|---|---|
| STT | Web Speech API | Deepgram Nova-2 (WebSocket) |
| TTS | Browser SpeechSynthesis | ElevenLabs Turbo v2 (Streaming) |
| LLM | Gemini Pro (REST) | Gemini Pro Vision (Agentic) |
| Automation | Playwright (Server) | Playwright + Chrome Extension |
| Risk | Strategy |
|---|---|
| API Costs | Aggressive caching + Local LLM fallback (Phi-2/Mistral) |
| Bot Detection | Human-like jitter + Random delays (50-150ms) |
| Complex Forms | Recursive Shadow DOM traversal + Dynamic wait |
📄 Full Setup Guide: See
SETUP_GUIDE.txtfor detailed instructions including troubleshooting.
| Software | Version | Download |
|---|---|---|
| Python | 3.10+ | python.org |
| Node.js | 18+ | nodejs.org |
| Git | Latest | git-scm.com |
git clone https://github.com/your-username/Form-Flow-AI.git
cd Form-Flow-AIcd form-flow-backend
# Create virtual environment
python -m venv .venv
# Activate (Windows)
.venv\Scripts\activate
# Activate (Linux/Mac)
# source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
playwright install chromium
# Configure environment
copy .env.example .env # Windows
# cp .env.example .env # Linux/Mac
# Start server
uvicorn main:app --reloadEdit .env with your API keys:
# Required - At least one LLM API key
GOOGLE_API_KEY=your_gemini_api_key_here
SECRET_KEY=your_super_secret_key_here
# Optional - Enhanced features
OPENAI_API_KEY=your_openai_api_key_here
DEEPGRAM_API_KEY=your_deepgram_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here💡 Get API Keys: Google AI Studio | OpenAI | Deepgram | ElevenLabs
cd form-flow-frontend
npm install
npm run dev✅ Frontend: http://localhost:5173
✅ Backend: http://localhost:8000
✅ API Docs: http://localhost:8000/docs
Run Form Flow AI completely offline with local models for maximum privacy and zero API costs.
Download and extract to project root:
# Download from: https://alphacephei.com/vosk/models
# Recommended: vosk-model-small-en-in-0.4 (~40MB)
Form-Flow-AI/
├── vosk-model-small-en-in-0.4/ # Extract here
│ ├── am/
│ ├── conf/
│ └── ...# From project root
python download_models.pyThis downloads Microsoft's Phi-2 model (~5.6GB) to models/phi-2/.
Requirements:
- ~6GB disk space
- ~4GB RAM (CPU) or 4GB VRAM (GPU)
- GPU recommended for 10x faster inference
docker-compose up --build- Backend: http://localhost:8000
- Frontend: http://localhost:3000
# Download models first
python download_models.py
# Run with local LLM
docker-compose -f docker-compose.local-llm.yml up --buildForm-Flow-AI/
├── form-flow-backend/
│ ├── core/ # Config, DB, Base Models
│ ├── routers/ # API Endpoints (FastAPI)
│ ├── services/
│ │ ├── form/ # HTML Parsing & Extraction
│ │ ├── pdf/ # Visual PDF Analysis & Overlay
│ │ ├── voice/ # STT/TTS Pipelines
│ │ ├── ai/ # LLM Agent & RAG
│ │ └── browser/ # Playwright Automation
│ └── utils/
├── form-flow-frontend/ # React + Vite + TailwindCSS
└── docker-compose.yml