WILLOWVIBE

🌿 WillowVibe

Open-source data infrastructure tools. Built in India. Used everywhere.
We build lightweight, self-hosted tooling that gives small data teams enterprise-grade pipeline observability, auditing, and automation — without vendor lock-in or SaaS bills.

🔍 What We Build

WillowVibe is a data engineering & AI tooling studio — solo-founded, contributor-driven, OSS-first.

Pipeline Auditing — point-in-time health checks on Airflow + dbt + warehouses; one command, one report
Data Observability — continuous monitoring for pipeline health, data freshness, volume anomalies, and schema drift
FinOps for Data — tracking Snowflake credits and BigQuery bytes billed, turning cloud cost chaos into actionable visibility
AI-Augmented Pipelines — embedding AI at the right layer of the data stack without replacing what already works
Open-Source First — every internal tool we build, we ship as OSS so the community benefits

We operate a solo + contributor model — lean by design, moving fast, building things that solve real problems for data teams.

🚀 Projects

🔬 PipelineProbe — New

Instant Data Pipeline Audit Report for Airflow + dbt + modern warehouses

Run a single command, get a full HTML audit report. PipelineProbe is a read-only CLI audit tool for data engineers who want a fast, objective health check of their pipeline stack — before a migration, after an incident, or as a recurring CI gate.

pip install pipelineprobe
pipelineprobe init      # generates pipelineprobe.yml
pipelineprobe audit     # produces pipelineprobe-report.html

✅ Airflow checks — high failure-rate DAGs, missing retries, missing SLAs, stale pipelines
✅ dbt checks — models with zero tests, failing test runs, orphaned models
✅ Warehouse checks — oversized tables, missing audit timestamps (Postgres, BigQuery, Snowflake)
✅ HTML + JSON report — traffic-light severity, health score 0–100, per-issue recommendations
✅ CI-ready — fail_on_critical exit code gates for GitHub Actions / GitLab CI
✅ Zero mutations — 100% read-only; safe to run against production

Stack: Python · Typer · Pydantic · httpx · Jinja2 · psycopg2 · dbt artifacts

🔭 ObservaKit

Self-hosted Data Observability & FinOps Starter Kit for small data teams

ObservaKit gives 1–5 person data teams the 5 core observability pillars — Freshness, Volume, Quality, Schema Drift, and Pipeline Health — in a single docker-compose up. No Monte Carlo. No Metaplane. No SaaS bill.

✅ Freshness Monitor — detects stale tables by tracking max(updated_at)
✅ Volume Anomaly — Z-score detection against 7-day rolling averages
✅ Quality Checks — Soda Core & Great Expectations templates, ready to use
✅ Schema Drift Detector — snapshots information_schema, diffs on every run
✅ Pipeline Health — Airflow/Prefect REST API + OpenTelemetry + Grafana
✅ FinOps Tracker — Snowflake credits & BigQuery bytes billed, natively
✅ Native dbt Integration — parses run_results.json directly, no extra packages

Stack: Python · FastAPI · SQLAlchemy · Alembic · Prometheus · Grafana · Docker Compose · dbt · Airflow / Prefect

🗂️ All Repositories

Repo	Description	Language	Status
🔬 pipelineprobe	Instant pipeline audit CLI — Airflow + dbt + warehouse	Python	`active`
🔭 ObservaKit	Self-hosted data observability & FinOps starter kit	Python	`active`
🧰 toolscontainer	Multi-purpose Python utility scripts & automations	Python	`maintained`
🕷️ scrapy-bot	Scrapy + Flask web scraping bot experiment	Python	`archived`
💻 online-ide	Lightweight online Python execution environment	Python	`experimental`

🛠️ Tech We Work With

Layer	Tools
Data Engineering	Python · dbt · Apache Airflow · Prefect · Apache Spark
Warehouses	PostgreSQL · Snowflake · BigQuery · DuckDB
Observability	Prometheus · Grafana · OpenTelemetry · Soda Core
Backend	FastAPI · SQLAlchemy · Alembic · Pydantic
Infra & DevOps	Docker · Docker Compose · Terraform · GitHub Actions
AI / ML	LangChain · OpenAI APIs · Vector DBs (Qdrant / ChromaDB)

🌱 Our Open-Source Philosophy

"Build what the ecosystem needs. Share what you build. Let the community make it better."

Every project we open-source follows three rules:

Zero vendor lock-in — runs on infra you own and control
Quickstart in under 10 minutes — if onboarding is painful, it won't get adopted
Progressive complexity — adopt one layer at a time; no all-or-nothing commitment

We actively maintain what we ship. Issues get responses. PRs get reviewed. Roadmaps get published.

🤝 Contributing

All public repos welcome contributions. Best places to start:

🔬 PipelineProbe → good first issues
- Add a new warehouse connector (Redshift, DuckDB)
- Add a new rule (task duration outliers, dbt source freshness)
- Improve the HTML report template
🔭 ObservaKit → good first issues
- Add a new warehouse connector (Redshift, Delta Lake)
- Write a Grafana dashboard for a new observability use case
- Improve documentation or add a real-world example

Read CONTRIBUTING.md before opening a PR.

📬 Get In Touch

We are open to:

Collaborations on data tooling, AI pipelines, or observability infra
Consulting engagements — data platform audits, pipeline migrations, cost optimization
Freelance / contract data engineering for startups and scaleups

Channel	Link
🐙 GitHub	@willowvibe
🔬 PipelineProbe Issues	Open an issue
🔭 ObservaKit Issues	Open an issue
🔐 Security Reports	See `SECURITY.md`

_{🌿 WillowVibe — Bengaluru, India ·
Building in the open since 2024 ·
Try PipelineProbe 🔬
·
Star ObservaKit ⭐}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WILLOWVIBE

🌿 WillowVibe

🔍 What We Build

🚀 Projects

🔬 PipelineProbe — New

🔭 ObservaKit

🗂️ All Repositories

🛠️ Tech We Work With

🌱 Our Open-Source Philosophy

🤝 Contributing

📬 Get In Touch

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!