Garmin Wearable Analytics

Garmin Wearable Analytics is a privacy-first case study built on local Garmin exports. It turns messy nested JSON into curated parquet tables, applies sanitization and quality gating before analysis, and uses notebook-driven EDA to surface interpretable behavioral and recovery patterns. The project is packaged as a balanced DS/DA portfolio artifact that combines analytical depth with reproducible engineering practices.

If you open only one file after this page, start with the case study.

What This Project Demonstrates

Robust ingestion and normalization of heterogeneous wearable exports (UDS + sleep JSON) into stable day-level tables
Privacy-aware preprocessing, with sanitization treated as a hard boundary before sharing or analysis
Quality labeling and artifact review, including strict vs loose readiness logic and suspicious-day triage
SQL-first analytics layer (DuckDB primary + compact PostgreSQL showcase) with CTE/window/view patterns
Structured EDA across coverage, time series, distributions, segmentation, and directed relationship analysis
Time-aware Stage 3 extension with statistical validation plus classification/regression baselines
Reproducible Python project organization with CLI workflows, tests, and CI-backed iteration

Role Fit

Strongest fit: DS generalist, Data Analyst, Product/Analytics, and analytics-heavy data roles that value messy real-world data handling as much as final charts.
Signals: raw nested JSON ingestion, privacy-safe preprocessing, quality-aware analysis, explicit limitations, and reproducible Python packaging.
Framing: this repository emphasizes trustworthy analytics and interpretable findings over heavy production ML, which is intentional for the portfolio story.

If You Have 60 Seconds

Headline Findings

The dataset spans 580 daily rows from 2023-05-26 to 2026-02-05, with explicit quality-aware filtering before analysis.
About 90.5% of days are strict good, which makes the retained EDA slices analytically useful without hiding real-world coverage gaps.
Weekly segmentation reveals stable routines: Saturday is the most active day, Sunday the least active, and Tuesday shows the highest median awake stress.
Higher daytime stress is associated with worse next-night recovery, supporting a day-to-night carryover story rather than same-row coincidence only.
Sleep score follows an optimum-duration pattern: mid-range sleep durations score best, while both shorter and longer nights tend to underperform.

Featured Visuals

Coverage calendar: the project keeps visible the difference between real behavioral variation and plain no-wear / partial-coverage periods.

Sleep score behaves like an optimum-duration pattern rather than a monotonic one: mid-range nights score better than both shorter and longer ones.

The strongest directional relationship in the repo is a negative association between daytime stress and next-night recovery score.

Project Structure

Pipeline / ingestion: discover raw Garmin exports, flatten nested JSON, and build parquet checkpoints
Quality & privacy: sanitize sensitive fields, generate a data dictionary, label day readiness, and isolate suspicious artifacts
SQL layer (optional): build a DuckDB mart, run portfolio SQL packs, and mirror a compact schema in PostgreSQL
EDA notebooks: prepare coverage-aware slices, inspect time series, analyze distributions, and validate cross-metric relationships
Case study & docs: recruiter-facing summary first, technical stage docs and notebooks second

Results Snapshot

rows: 580
date range: 2023-05-26 to 2026-02-05
strict labels: good 90.52%, partial 3.79%, bad 5.69%
loose labels: good 93.45%, partial 0.86%, bad 5.69%
corrupted stress-only days: 21 (3.62%)

Stage 3 Snapshot

Primary task: predict whether next-night sleepRecoveryScore < 75 with contiguous time-ordered splits.
Best interpretable model family: sparse logistic variants using compact daytime stress/heart-rate/body-battery context.
Typical test performance range: balanced accuracy ~0.64, ROC-AUC ~0.64-0.68, PR-AUC ~0.35-0.40.
Statistical validation supports key directional findings (for example, daytime awake stress -> lower next-night recovery).

Technical Appendix / Deep Dive

Start here for the portfolio narrative, then use the links below for technical depth:

Case study - recruiter-friendly project narrative and key findings.
Relationships notebook - directional D -> D+1 relationships and artifact checks.
Distributions notebook - metric distributions and segmented behavior patterns.
Overview - map of stages, outputs, and how to navigate the repository.
Pipeline - end-to-end flow from raw exports to analysis artifacts.
EDA guide - notebook purpose, structure, and interpretation scope.
Stage 0 - discovery, ingestion, and parquet build details.
Stage 1 - sanitize, data dictionary, and quality labeling.
Stage 2 - EDA workflow and promoted observational findings.
Stage 3 - predictive modeling and lightweight statistical validation.
SQL layer - DuckDB mart, SQL query pack, and PostgreSQL showcase.
CLI - command reference, flags, outputs, and run order.
Privacy - guardrails for local-only data and safe publishing boundaries.

Quickstart

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
python -m pip install -e .

Primary CLI mode:

garmin-analytics discover
garmin-analytics ingest-uds
garmin-analytics ingest-sleep
garmin-analytics build-daily
garmin-analytics sanitize
garmin-analytics quality

Optional SQL layer:

garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolio

Open notebooks:

jupyter lab

Public Demo

If you do not have private Garmin exports, you can still exercise the public Stage 1 workflow on a tiny committed sample:

PYTHONPATH=src .venv/bin/python scripts/setup_public_demo.py
garmin-analytics data-dictionary --markdown-mode both
garmin-analytics quality
garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolio

Details: Public demo

SQL Showcase

DuckDB (primary local analytics mart):

garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolio

PostgreSQL (compact production-like mirror):

setup + runbook: examples/postgres_showcase/README.md
schema/views/queries: examples/postgres_showcase/
SQL skills demonstrated: CTEs, window functions, day-to-next-day (D -> D+1) alignment, and view-based analytics contracts.

Privacy

Raw Garmin exports stay local and must never be committed. Sanitized outputs are the default analysis and sharing boundary. See docs/privacy.md.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
notebooks		notebooks
reports		reports
scripts		scripts
sql/duckdb		sql/duckdb
src/garmin_analytics		src/garmin_analytics
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Garmin Wearable Analytics

What This Project Demonstrates

Role Fit

If You Have 60 Seconds

Headline Findings

Featured Visuals

Project Structure

Results Snapshot

Stage 3 Snapshot

Technical Appendix / Deep Dive

Quickstart

Public Demo

SQL Showcase

Privacy

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Garmin Wearable Analytics

What This Project Demonstrates

Role Fit

If You Have 60 Seconds

Headline Findings

Featured Visuals

Project Structure

Results Snapshot

Stage 3 Snapshot

Technical Appendix / Deep Dive

Quickstart

Public Demo

SQL Showcase

Privacy

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages