Skip to content

wiqilee/assumption-miner

Repository files navigation

🧬 Assumption Miner

Every bug starts as an assumption that went unchecked.

Assumption Miner finds them before your users do.

GitLab AI Hackathon 2026 Powered by Groq Powered by OpenRouter Python Rust TypeScript

Backend on Google Cloud Backend on Google Cloud (Docs-API explorer) Frontend on Vercel Demo Video


⚡ The Problem

Most production incidents don't come from code that's obviously wrong. They come from code that looks right but relies on assumptions nobody wrote down.

Things like: "this API always returns in under 500ms," "the database connection will be there when we need it," "file uploads will never exceed 10MB," or "payments always succeed on the first try."

Linters won't catch these. Test suites won't flag them. Code reviews might, if the reviewer happens to think about it. But usually, nobody does. These assumptions sit quietly in the codebase, invisible, until something changes in production and they all break at once.

The worst part? Once something breaks, nobody can point to where the assumption was made. It's not documented. It's not tracked. It's just a belief that was baked into the code months ago by someone who's probably on a different team now.

This is assumption debt, and every codebase carries it. It grows silently sprint after sprint, and it's the number one source of "but it worked on my machine" incidents.


💡 The Solution

Assumption Miner is a GitLab Duo Agent that detects implicit assumptions in your code, tracks them over time, and helps you fix them before they cause outages.

It hooks into your GitLab workflow and runs automatically when you create a merge request. Under the hood, it combines AST-level code analysis with AI-powered reasoning (via Groq or OpenRouter) to identify patterns that static analysis tools miss entirely: unhandled failure modes, hardcoded thresholds, missing input validation, and undocumented environmental dependencies.

Every assumption gets a DNA fingerprint, so it can be tracked across renames, refactors, and branch merges. The agent doesn't just find problems; it predicts which assumptions will break next, maps them to security frameworks (CWE, OWASP, SOC2), and auto-generates fix MRs so your team can resolve issues in one click.

The result: fewer surprises in production, clearer risk visibility for your team, and a codebase that documents its own blind spots.




🧪 In Action

Frontend on Vercel

📊 Dashboard: System Health Overview



🕸️ Dependency Graph: Hidden Assumption Relationships



⏱️ Timeline: Evolution of Risks Over Time



🔮 Predictions: Future Failure Signals



🛡️ Security: Risk & Vulnerability Insights



⚙️ Settings Page


🏗️ Architecture

graph TB
    subgraph GitLab["GitLab Platform"]
        MR["Merge Request Created"]
        DUO["GitLab Duo Chat"]
        CICD["CI/CD Pipeline"]
    end

    subgraph Agent["Assumption Miner Agent"]
        direction TB
        SCAN["AST Parser + Custom Rules"]
        AI["AI Classification<br/>Groq / OpenRouter"]
        DNA["DNA Fingerprinting"]
        SEC["Security Mapper<br/>CWE · OWASP · SOC2"]
    end

    subgraph Scoring["Scoring Engine"]
        RUST["Rust + WASM<br/>Monte Carlo Simulation"]
        ML["Trend Prediction<br/>Linear Regression + WMA"]
    end

    subgraph Cloud["Cloud Infrastructure"]
        GCR["Google Cloud Run<br/>FastAPI Backend"]
        VERCEL["Vercel<br/>React Frontend"]
    end

    subgraph Output["Output Layer"]
        COMMENT["MR Comment with Findings"]
        AUTOFIX["Auto-Fix MR"]
        GATE["Quality Gate Verdict"]
        DASH["React Dashboard<br/>3D Graph · Timeline · Health"]
    end

    MR -->|trigger| SCAN
    DUO -->|on-demand| SCAN
    SCAN --> AI
    AI --> DNA
    DNA --> SEC
    SEC --> RUST
    RUST --> ML
    ML --> COMMENT
    ML --> AUTOFIX
    ML --> GATE
    ML --> DASH
    CICD -->|quality gate| GATE
    GCR -->|REST API| VERCEL
    VERCEL -->|serves| DASH
Loading

🎯 Why It's Different

Most code quality tools look at what your code does. Assumption Miner looks at what your code believes.

What makes it unique Why it matters
Finds what linters can't Detects implicit assumptions (not just syntax or style issues) using AI reasoning on top of AST parsing
DNA fingerprinting Each assumption gets a stable identity that survives renames, moves, and refactors, so nothing gets lost
Predicts before it breaks Forecasts your health score 2 to 4 sprints ahead, so you can prioritize before things go wrong
Security-aware by default Auto-maps every assumption to CWE, OWASP Top 10, and compliance frameworks (SOC2, PCI-DSS, GDPR)
One-click auto-fix Generates fix patches, creates a branch, and opens a merge request. No context-switching required
Lives in your pipeline Plugs directly into GitLab CI/CD as a quality gate. If the health score drops, the merge gets blocked
Gets smarter over time Learns from developer feedback (slash commands, emoji reactions) to calibrate confidence and reduce noise
Sub-millisecond scoring Health scoring runs in Rust compiled to WebAssembly with Monte Carlo simulation, directly in the browser
Production-ready deployment Backend on Google Cloud Run (auto-scaling, zero cold start), frontend on Vercel (global CDN, instant deploys)

✨ Features

Detection & Tracking

Icon Feature What It Does
🔍 AST + AI Analysis Parses code structure and uses LLMs to classify assumptions that static analysis misses
🧬 DNA Fingerprinting Assigns a stable identity to each assumption so it survives renames, moves, and refactors
🕸️ Dependency Graph Interactive 3D holographic visualization showing how assumptions cluster and connect
Timeline & Time Travel Tracks how assumptions evolve sprint over sprint with time-travel navigation
🏥 Health Scoring A+ through F grade powered by a Rust WASM engine with Monte Carlo simulation

Remediation & Prevention

Icon Feature What It Does
🛠️ Auto-Fix MR Creation Generates fix patches from templates (error handling, null checks, timeout, input validation, type safety, hardcoded values), creates a branch, and opens a merge request
🚦 CI/CD Quality Gate Plugs into your pipeline and blocks merges when the health score dips below threshold
📈 Trend Prediction Forecasts your health score 2 to 4 sprints ahead using regression and weighted moving average on historical snapshots
🎯 Breaking Change Predictor Flags which assumptions are most likely to cause failures during upcoming refactors

Intelligence & Compliance

Icon Feature What It Does
🛡️ Security Impact Mapping Links every assumption to CWE entries, OWASP Top 10 categories, and compliance controls (SOC2, PCI-DSS, GDPR)
💬 Feedback Learning Loop Developers react with assumption-miner feedback or emoji; the agent adjusts confidence weights accordingly
⚙️ Custom Rules Engine Define project-specific detection rules in YAML with regex, AST, literal, function call, and context matching
🌐 Cross-Repo Drift Detection Spots assumption divergence across microservices sharing the same interfaces

🚀 Quick Start

Clone

git clone https://gitlab.com/gitlab-ai-hackathon/participants/34658878.git assumption-miner
cd assumption-miner

Environment Variables

cp .env.example .env
# Edit .env and fill in your keys
Variable Provider Notes
GROQ_API_KEY Groq Free tier available, fastest inference
OPENROUTER_API_KEY OpenRouter Access to Claude, Gemini, Mixtral, and others
GITLAB_TOKEN GitLab Personal access token for MR creation and API access

Install & Run

# Option 1: Using the setup script
chmod +x scripts/setup.sh scripts/run.sh
./scripts/setup.sh
./scripts/run.sh

# Option 2: Manual setup
# Backend
pip install -r backend/python/requirements.txt
uvicorn backend.python.api.main:app --port 8000

# Frontend (separate terminal)
cd frontend && npm install && npm run dev

The frontend runs at http://localhost:3000 and the backend at http://localhost:8000 (Swagger docs at /docs).

Docker

docker compose up -d

☁️ Deployment

Assumption Miner is deployed as two independent services - a Python backend on Google Cloud Run and a React frontend on Vercel. Both are production-ready and publicly accessible.

Backend - Google Cloud Run

Backend on Google Cloud Backend on Google Cloud (Docs-API explorer)



The FastAPI backend is containerized and deployed on Google Cloud Run, Google's fully managed serverless container platform.

Property Detail
Platform Google Cloud Run (asia-southeast1)
Runtime Python 3.11 + FastAPI in Docker container
Scaling Auto-scales from 0 to N instances based on traffic
Auth Public endpoint, no authentication required for demo
Swagger UI Available at /docs on the backend URL

Why Google Cloud Run?

  • Zero infrastructure management - no servers to provision or maintain
  • Scales to zero when idle (cost-efficient for a hackathon project)
  • Instant scale-up when the GitLab agent pushes scan results
  • Full container support - identical to local Docker environment

Deploy to Cloud Run:

# Build and push container
gcloud builds submit --tag gcr.io/YOUR_PROJECT_ID/assumption-miner-backend

# Deploy to Cloud Run
gcloud run deploy assumption-miner-backend \
  --image gcr.io/YOUR_PROJECT_ID/assumption-miner-backend \
  --platform managed \
  --region asia-southeast1 \
  --allow-unauthenticated \
  --set-env-vars GROQ_API_KEY=your_key,OPENROUTER_API_KEY=your_key,GITLAB_TOKEN=your_token

Or using the included script:

chmod +x scripts/deploy-gcloud.sh
./scripts/deploy-gcloud.sh

Frontend - Vercel

Frontend on Vercel

The React frontend is deployed on Vercel, connected directly to the GitLab repository. Every push to main triggers an automatic deployment.

Property Detail
Platform Vercel (Global Edge Network)
Framework React + Vite (TypeScript)
Live URL assumption-miner.vercel.app
CDN Served from Vercel's global edge - fast everywhere
Deploys Automatic on every push to main

Why Vercel?

  • Zero-config deployment for Vite + React
  • Global CDN ensures fast load times for judges and reviewers worldwide
  • Preview deployments for every branch - easy to review UI changes
  • Environment variables managed via Vercel dashboard (no secrets in repo)

Environment variable required in Vercel dashboard:

VITE_API_URL=https://your-cloud-run-backend-url

This tells the frontend where the backend lives. Set it in Vercel → Project → Settings → Environment Variables.

Manual deploy (if needed):

npm install -g vercel
cd frontend
vercel --prod

How Frontend and Backend Connect

GitLab Duo Agent
      │
      ▼
.assumption-miner-latest.json
      │
      ▼
python scripts/push_agent_results.py
      │  (HTTP POST)
      ▼
Google Cloud Run (FastAPI Backend)
      │  (REST API via VITE_API_URL)
      ▼
Vercel (React Frontend)
      │
      ▼
assumption-miner.vercel.app

The agent writes scan results to .assumption-miner-latest.json. The push script reads this file and POSTs it to the Cloud Run backend via /api/v1/analyze. The React frontend (served by Vercel) then fetches the latest results from the backend and renders the dashboard in real-time.


🛠️ Tech Stack

Layer Technology Why
Orchestration GitLab Duo Agent + Flow (YAML) Native GitLab integration; triggers on MR events and schedules
Backend Python, FastAPI AST parsing, AI orchestration, REST API
Backend Hosting Google Cloud Run Serverless containers, auto-scaling, zero ops
Scoring Engine Rust compiled to WebAssembly Sub-millisecond health scoring and Monte Carlo simulation in the browser
AI Groq (Llama 3.3), OpenRouter Pattern classification, risk assessment, and fix generation
ML Python (linear regression, WMA) Trend prediction and feedback-driven confidence calibration
Frontend React, TypeScript, Tailwind CSS, Three.js Dashboard, 3D dependency graph, and real-time health visualization
Frontend Hosting Vercel Global CDN, automatic deploys on push, zero config
Storage SQLite Assumption registry, feedback history, and pattern adjustments

⚙️ How It Works

flowchart TD
    A["Trigger: MR created or weekly schedule"] --> B["Parse code with AST + custom rules"]
    B --> C["AI classifies and scores risk\nDNA fingerprint assigned"]
    C --> D["Map to CWE / OWASP / SOC2"]
    D --> E["Output"]
    E --> E1["MR comment with findings"]
    E --> E2["Auto-fix MR for critical issues"]
    E --> E3["CI/CD quality gate verdict"]
    E --> E4["Health score + trend forecast"]
    E --> E5["Issue creation for unresolved items"]
    E4 --> F["Push to Google Cloud Run backend"]
    F --> G["React dashboard on Vercel updates in real-time"]
Loading

🤖 GitLab Duo Integration

Assumption Miner integrates with GitLab Duo as both an Agent and a Flow:

Agent (On-Demand)

The agent is published as Assumption Miner Agent V9 in the GitLab AI Hackathon group. Chat with it directly in the GitLab Duo sidebar to analyze files, scan for assumptions, or explain risk scores.

@assumption-miner scan src/api/payment.py for implicit assumptions

You can also trigger a full scan, auto-save results, and sync to the dashboard in one instruction:

scan assumptions in this repository, save the results to .assumption-miner-latest.json, then tell me to run python scripts/push_agent_results.py

Or use the built-in slash command provided by the Agent Skill:

scan-assumptions backend/python/services/scorer.py

After completing its analysis, the agent automatically saves all findings as structured JSON to .assumption-miner-latest.json in the repository root. Run the push script locally to sync results to the dashboard:

git pull hackathon main
python scripts/push_agent_results.py

Flow (Automated)

The flow triggers automatically on merge requests, running a three-step pipeline:

  1. Scan & Classify: reads changed files, identifies assumptions, maps to CWE/OWASP
  2. Save Results: writes findings as structured JSON to .assumption-miner-latest.json in the repo root
  3. Report to MR: posts a structured comment with health grade, findings table, and priority fixes

To sync flow results to the dashboard after a scan:

git pull hackathon main
python scripts/push_agent_results.py

See flows/assumption-miner-flow.yml for the flow definition.

CI/CD Quality Gate

The included CI template blocks merges when the health score drops below your configured threshold:

include:
  - local: '.gitlab/ci-templates/quality-gate.yml'

See .gitlab-ci.yml for the full pipeline configuration.


📚 API Reference

Core, Auto-Fix, Quality Gate

Method Endpoint Description
POST /api/v1/analyze Run assumption analysis on submitted files
GET /api/v1/health/{repo} Retrieve the current health score and grade
GET /api/v1/timeline Fetch the assumption evolution timeline
POST /api/v1/auto-fix/generate Preview generated fixes without side effects
POST /api/v1/auto-fix/apply Apply fixes: create branch, commit, and open MR
POST /api/v1/health-gate Evaluate health score against threshold (CI/CD)
GET /api/v1/health-gate/badge/{id} Embeddable SVG health badge

Predictions, Security, Feedback, Rules

Method Endpoint Description
GET /api/v1/predictions/{project_id} Forecast health score N sprints ahead
GET /api/v1/security/{assumption_id} CWE, OWASP, and compliance mapping
GET /api/v1/security/report/{project_id} Full project security posture report
GET /api/v1/compliance/{project_id} Compliance status by framework
POST /api/v1/feedback Submit developer feedback
POST /api/v1/feedback/learn Trigger learning cycle
POST /api/v1/webhooks/gitlab GitLab webhook receiver
GET /api/v1/rules/{project_id} List all active rules
POST /api/v1/rules/{project_id} Create a new detection rule
POST /api/v1/rules/{project_id}/test Test a rule against code

Full API documentation is also available at docs/api/.


🗂️ Project Structure

assumption-miner/
├── AGENTS.md                          # Repository-level agent context and guidelines
├── agents/
│   ├── assumption-miner.yml           # GitLab Duo Agent definition
│   └── agent.yml.template             # Agent template
├── flows/
│   ├── assumption-miner-flow.yml      # GitLab Duo Flow definition
│   └── flow.yml.template              # Flow template
├── skills/
│   └── scan-assumptions/
│       └── SKILL.md                   # scan-assumptions slash command skill
├── .gitlab/
│   ├── agents/assumption-miner.yml    # Agent registration
│   └── ci-templates/quality-gate.yml  # CI quality gate template
├── backend/python/
│   ├── ai/                # Groq + OpenRouter clients, prompt templates
│   ├── analyzer/          # AST parser, DNA fingerprinting, multi-lang support, patterns
│   ├── api/               # FastAPI routes: core, auto-fix, feedback, predictions, quality gate, rules, security, webhooks
│   ├── data/              # CWE, OWASP, compliance databases
│   ├── db/                # SQLAlchemy models, migrations
│   ├── ml/                # Trend model, forecaster, feature extractor, feedback learner
│   ├── models/            # Data models: assumption, graph, health
│   ├── rules/             # Custom rules engine with matchers (literal, pattern, context, function call)
│   ├── services/          # Business logic: scorer, predictor, auto-fix, MR creator, security mapper, graph builder, cross-repo, time travel
│   │   └── templates/fixes/  # Fix templates: error handling, null checks, timeout, input validation, type safety, hardcoded values
│   └── utils/             # Git and GitLab utilities
├── backend/rust/
│   └── src/               # WASM scoring engine: scorer, Monte Carlo, DNA, graph, types
├── frontend/
│   ├── public/            # Static assets
│   └── src/
│       ├── 3d/            # Three.js holographic graph: Scene, ParticleField, AnimatedEdge, AssumptionSphere, CentralCore
│       ├── components/    # Dashboard, GraphView, SecurityPage, PredictionsPage, TimelinePage, SettingsPage, AboutPage, AutoFixModal
│       ├── data/          # Demo data
│       ├── hooks/         # Zustand store, API hooks, WASM hooks, animation hooks
│       ├── styles/        # Global styles, animations
│       ├── types/         # TypeScript type definitions
│       └── utils/         # API client, WASM loader, formatting, color utilities
├── docs/
│   ├── api/               # API endpoint docs and model reference
│   └── guides/            # Getting started, customization, GitLab integration guides
├── examples/
│   ├── ASSUMPTIONS.md     # Example assumption documentation
│   └── sample-repo/       # Sample Python files to test against
├── scripts/               # Setup, run, build, deploy, test, quality-gate scripts
│   ├── push_agent_results.py  # Reads .assumption-miner-latest.json and POSTs to backend dashboard
│   └── deploy-gcloud.sh       # Deploy backend to Google Cloud Run
├── tests/
│   ├── python/            # Backend tests: analyzer, API, graph, scorer
│   ├── rust/              # WASM engine tests: scorer, graph
│   └── frontend/          # React component tests
├── .gitlab-ci.yml         # CI/CD pipeline configuration
├── docker-compose.yml     # Docker setup
├── Makefile               # Build commands
└── package.json           # Root package.json

📖 Documentation

Detailed documentation is available in the docs/ directory:

Guide Description
Architecture System design and component overview
Getting Started Step-by-step setup guide
GitLab Integration Configuring Duo Agent, Flow, and CI/CD
Customization Custom rules, thresholds, and project-specific tuning
API Endpoints Full API reference
API Models Request/response schemas
Contributing How to contribute

⚖️ License

MIT License © 2026 Wiqi Lee. Built for the GitLab AI Hackathon 2026. See LICENSE for details.

🤝 Ethics & Attribution

This project was created by Wiqi Lee as a submission for the GitLab AI Hackathon 2026. The project template and license structure are provided by GitLab under the MIT License.

If you use, fork, or build upon this code, please:

  • Give proper attribution. Credit the original author (Wiqi Lee) and link back to this repository.
  • Keep the license intact. Do not remove or alter the MIT License file.
  • Don't misrepresent authorship. Do not claim this work as your own in any competition, portfolio, or submission.
  • Respect the spirit of open source. Contribute back improvements when possible, and use this code to learn and build, not to plagiarize.

"Good code is shared freely. Good ethics means acknowledging where it came from."


Wiqi Lee · Built for GitLab AI Hackathon 2026 · Discord · X

About

Assumption Miner - GitLab Duo Agent that detects implicit assumption debt in your codebase. Built for GitLab AI Hackathon 2026.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors