Instant Data Pipeline Audit Report for Airflow + dbt + modern warehouses
PipelineProbe is a read-only, one-command audit tool for data pipelines. It connects to your existing stack — Apache Airflow, dbt, and a modern warehouse — and produces a single actionable HTML or JSON report surfacing critical issues, missing SLAs, missing tests, high failure rates, and more.
It is open-sourced by WillowVibe.
| Area | What it checks |
|---|---|
| Airflow | DAG failure rates, missing retries, missing SLAs, stale pipelines, alert configuration |
| dbt | Models with zero tests, test failure ratio, failing last runs |
| Warehouse | Largest tables, tables missing audit timestamps (created_at/updated_at) |
| Report | Health score (0–100), critical / warning / info counts, HTML + JSON output |
pip install pipelineprobepipelineprobe initThis creates pipelineprobe.yml in the current directory.
pipelineprobe audit --config pipelineprobe.ymlReports are written to ./reports/ by default.
Want to see it in action without a local stack? Try our Quickstart Example:
cd examples/quickstart
docker compose up --buildSee docs/configuration.md for the full reference. A minimal example:
orchestrator:
base_url: "http://localhost:8080"
username: "admin"
# password via env: PIPELINEPROBE_AIRFLOW_PASSWORD
dbt:
project_dir: "./analytics"
manifest_path: "target/manifest.json"
run_results_path: "target/run_results.json"
warehouse:
type: postgres # postgres | bigquery | snowflake
dsn: "postgresql://user:pass@localhost:5432/analytics"
# for BigQuery: project_id: "my-gcp-project"
# for Snowflake: account: "xyz.us-east-1", username: "...", password: "..."
report:
output_dir: "./reports"
format: "html" # html | json | both
fail_on_critical: 5| Flag | Description |
|---|---|
--config |
Path to config YAML (default: pipelineprobe.yml) |
--format |
Override output format: html, json, or both |
--fail-on-critical |
Override the critical issue threshold for CI exits |
--version |
Show version and exit |
| Command | Description |
|---|---|
init |
Initialize a default pipelineprobe.yml |
audit |
Run the full audit pipeline |
doctor |
Validate connectivity to source systems |
| Connector | Status |
|---|---|
| Apache Airflow (REST API ≥ 2.0) | ✅ Supported |
| dbt Core (manifest + run_results) | ✅ Supported |
| PostgreSQL | ✅ Supported |
| BigQuery | ✅ Supported |
| Snowflake | ✅ Supported |
Identify issues before they hit production. Run pipelineprobe audit locally or manually on a dev machine to verify current infra state.
Fail your build when critical issues surface. Use the --fail-on-critical 0 flag to enforce strict standards. See CI Guide.
Perfect for external auditors or consultants. Connect to a client's Airflow/Postgres once, run the audit, and provide the polished HTML report as a deliverable.
How is PipelineProbe different from full observability platforms?
| Feature | Monitoring Tools (Datadog, Monte Carlo) | Quality Libraries (Soda, GE) | PipelineProbe |
|---|---|---|---|
| Focus | Continuous monitoring & alerting | Row-level data validation | Infrastructure & config audit |
| Effort | High (setup agents/SDKs) | Medium (write YAML expectations) | Zero (read-only API/metastore) |
| Best For | On-call engineers | Data engineers | Consultants / Team Leads |
PipelineProbe can automatically fail your CI pipeline when critical issues exceed your threshold. See docs/ci-integration.md for GitHub Actions and GitLab CI examples.
| Document | Description |
|---|---|
| Configuration Reference | All YAML and environment variable options |
| CI Integration Guide | GitHub Actions, GitLab CI, fail-on-critical |
| Architecture | How connectors, rules, and the renderer fit together |
| Contributing | Development setup, testing, PRs |
| Changelog | Release history |
- v0.2.0: Prefect and Dagster connectors.
- v0.3.0: Basic cost insights (scanned bytes for BQ/Snowflake).
- v1.0.0: Comprehensive data lineage support.
See CONTRIBUTING.md for how to get started.
MIT License — see LICENSE for details.
