Skip to content

IainD925/SeaQL_Sat_DB_DevOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Database DevOps Safety Harness: Automating Data Quality Gates with Great Expectations and Soda

Three simple questions:

  • Who controls your source data?
  • Do you trust them not to change it?
  • When something changes - how do you usually find out about it?

What is Trust? How about: “repeatedly demonstrating you do as you say you will do”

Trust in data is the heart of this demo because 'Trust' and 'Hope' lead to very different outcomes.

Demo goal:

  • Provde a database with a green release.
  • Introduce change that will break the release / normal operations and finally
  • Catch and communicate the failure before a technical issue becomes a Trust issue.

Demo scenarios

The pipeline supports four scenarios via demoScenario:

  • baseline: No breakage (expected green run)
  • schema: Schema drift
  • duplicates: Duplicate key
  • reference: Business rule breakage (orphaned order, date sequence and pricing).

This supports a quick side-by-side of a clean deployment versus a blocked one.

Pipeline flow

The pipeline is organized into three stages:

  1. ResetAndSeed
  2. IntroduceIssue (skipped for baseline)
  3. DQ_Gates

In DQ_Gates, both tools run, publish evidence, and fail the pipeline if either gate fails.

Before you run it

You need:

  • An Azure SQL Database reachable from Azure DevOps hosted agents.
  • An Azure DevOps service connection.
  • A Key Vault containing SQL-HOST and DB-NAME.
  • A contained database user for the service principal.

The pipeline uses Entra ID token authentication — no SQL passwords are stored. It acquires a short-lived Azure SQL access token at runtime via Azure CLI and uses that token for SQL script execution and scanning.

For the actual setup steps, use SETUP.md.

Evidence produced

The DQ_Gates stage installs dependencies from requirements.txt, runs both validation toolchains, and publishes artifacts as dq-evidence.

Expected evidence includes:

  • artifacts/soda_scan.log
  • artifacts/soda_summary.txt
  • artifacts/great_expectations_results.json
  • artifacts/great_expectations_summary.txt
  • artifacts/great_expectations_data_docs.html
  • artifacts/soda_exit_code.txt
  • artifacts/gx_exit_code.txt
  • artifacts/pipeline_diagnostics.txt

Running the demo

  1. Run with demoScenario=baseline to show a passing deployment and gate run.
  2. Run with demoScenario=schema to show schema drift detection.
  3. Run with demoScenario=duplicates to show uniqueness failure.
  4. Run with demoScenario=reference to show referential and business-rule failure.
  5. Review the dq-evidence artifact and pipeline logs.

Key files

  • azure-pipelines.yml: Demo pipeline with reset, defect injection, and quality gates.
  • sql/: Reset, seed, and break scripts.
  • quality/soda/: Soda configuration and checks.
  • quality/ge/ge_validate_orders.py: Great Expectations validation script.
  • docs/log_review.txt: Simple log review template.

Setup and next step

Use SETUP.md once to prepare Azure SQL, Key Vault, and the Azure DevOps service connection. After that, start with demoScenario=baseline and then run one failing scenario.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors