Database DevOps Safety Harness: Automating Data Quality Gates with Great Expectations and Soda

Three simple questions:

Who controls your source data?
Do you trust them not to change it?
When something changes - how do you usually find out about it?

What is Trust? How about: “repeatedly demonstrating you do as you say you will do”

Trust in data is the heart of this demo because 'Trust' and 'Hope' lead to very different outcomes.

Demo goal:

Provde a database with a green release.
Introduce change that will break the release / normal operations and finally
Catch and communicate the failure before a technical issue becomes a Trust issue.

Demo scenarios

The pipeline supports four scenarios via demoScenario:

baseline: No breakage (expected green run)
schema: Schema drift
duplicates: Duplicate key
reference: Business rule breakage (orphaned order, date sequence and pricing).

This supports a quick side-by-side of a clean deployment versus a blocked one.

Pipeline flow

The pipeline is organized into three stages:

ResetAndSeed
IntroduceIssue (skipped for baseline)
DQ_Gates

In DQ_Gates, both tools run, publish evidence, and fail the pipeline if either gate fails.

Before you run it

You need:

An Azure SQL Database reachable from Azure DevOps hosted agents.
An Azure DevOps service connection.
A Key Vault containing SQL-HOST and DB-NAME.
A contained database user for the service principal.

The pipeline uses Entra ID token authentication — no SQL passwords are stored. It acquires a short-lived Azure SQL access token at runtime via Azure CLI and uses that token for SQL script execution and scanning.

For the actual setup steps, use SETUP.md.

Evidence produced

The DQ_Gates stage installs dependencies from requirements.txt, runs both validation toolchains, and publishes artifacts as dq-evidence.

Expected evidence includes:

artifacts/soda_scan.log
artifacts/soda_summary.txt
artifacts/great_expectations_results.json
artifacts/great_expectations_summary.txt
artifacts/great_expectations_data_docs.html
artifacts/soda_exit_code.txt
artifacts/gx_exit_code.txt
artifacts/pipeline_diagnostics.txt

Running the demo

Run with demoScenario=baseline to show a passing deployment and gate run.
Run with demoScenario=schema to show schema drift detection.
Run with demoScenario=duplicates to show uniqueness failure.
Run with demoScenario=reference to show referential and business-rule failure.
Review the dq-evidence artifact and pipeline logs.

Key files

azure-pipelines.yml: Demo pipeline with reset, defect injection, and quality gates.
sql/: Reset, seed, and break scripts.
quality/soda/: Soda configuration and checks.
quality/ge/ge_validate_orders.py: Great Expectations validation script.
docs/log_review.txt: Simple log review template.

Setup and next step

Use SETUP.md once to prepare Azure SQL, Key Vault, and the Azure DevOps service connection. After that, start with demoScenario=baseline and then run one failing scenario.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Database DevOps Safety Harness: Automating Data Quality Gates with Great Expectations and Soda

Demo goal:

Demo scenarios

Pipeline flow

Before you run it

Evidence produced

Running the demo

Key files

Setup and next step

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
quality		quality
sql		sql
.gitignore		.gitignore
README.md		README.md
SETUP.md		SETUP.md
azure-pipelines.yml		azure-pipelines.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Database DevOps Safety Harness: Automating Data Quality Gates with Great Expectations and Soda

Demo goal:

Demo scenarios

Pipeline flow

Before you run it

Evidence produced

Running the demo

Key files

Setup and next step

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages