Garmin Wearable Analytics is a privacy-first case study built on local Garmin exports. It turns messy nested JSON into curated parquet tables, applies sanitization and quality gating before analysis, and uses notebook-driven EDA to surface interpretable behavioral and recovery patterns. The project is packaged as a balanced DS/DA portfolio artifact that combines analytical depth with reproducible engineering practices.
If you open only one file after this page, start with the case study.
- Robust ingestion and normalization of heterogeneous wearable exports (
UDS+ sleep JSON) into stable day-level tables - Privacy-aware preprocessing, with sanitization treated as a hard boundary before sharing or analysis
- Quality labeling and artifact review, including strict vs loose readiness logic and suspicious-day triage
- SQL-first analytics layer (
DuckDBprimary + compactPostgreSQLshowcase) with CTE/window/view patterns - Structured EDA across coverage, time series, distributions, segmentation, and directed relationship analysis
- Time-aware Stage 3 extension with statistical validation plus classification/regression baselines
- Reproducible Python project organization with CLI workflows, tests, and CI-backed iteration
- Strongest fit:
DS generalist,Data Analyst,Product/Analytics, and analytics-heavy data roles that value messy real-world data handling as much as final charts. - Signals: raw nested JSON ingestion, privacy-safe preprocessing, quality-aware analysis, explicit limitations, and reproducible Python packaging.
- Framing: this repository emphasizes trustworthy analytics and interpretable findings over heavy production ML, which is intentional for the portfolio story.
- Case study
- Stage 3 (validation + modeling)
- SQL layer (DuckDB + PostgreSQL showcase)
- Relationships notebook
- The dataset spans 580 daily rows from 2023-05-26 to 2026-02-05, with explicit quality-aware filtering before analysis.
- About 90.5% of days are
strict good, which makes the retained EDA slices analytically useful without hiding real-world coverage gaps. - Weekly segmentation reveals stable routines: Saturday is the most active day, Sunday the least active, and Tuesday shows the highest median awake stress.
- Higher daytime stress is associated with worse next-night recovery, supporting a day-to-night carryover story rather than same-row coincidence only.
- Sleep score follows an optimum-duration pattern: mid-range sleep durations score best, while both shorter and longer nights tend to underperform.
Coverage calendar: the project keeps visible the difference between real behavioral variation and plain no-wear / partial-coverage periods.
Sleep score behaves like an optimum-duration pattern rather than a monotonic one: mid-range nights score better than both shorter and longer ones.
The strongest directional relationship in the repo is a negative association between daytime stress and next-night recovery score.
- Pipeline / ingestion: discover raw Garmin exports, flatten nested JSON, and build parquet checkpoints
- Quality & privacy: sanitize sensitive fields, generate a data dictionary, label day readiness, and isolate suspicious artifacts
- SQL layer (optional): build a DuckDB mart, run portfolio SQL packs, and mirror a compact schema in PostgreSQL
- EDA notebooks: prepare coverage-aware slices, inspect time series, analyze distributions, and validate cross-metric relationships
- Case study & docs: recruiter-facing summary first, technical stage docs and notebooks second
- rows: 580
- date range: 2023-05-26 to 2026-02-05
- strict labels: good 90.52%, partial 3.79%, bad 5.69%
- loose labels: good 93.45%, partial 0.86%, bad 5.69%
- corrupted stress-only days: 21 (3.62%)
- Primary task: predict whether
next-night sleepRecoveryScore < 75with contiguous time-ordered splits. - Best interpretable model family: sparse logistic variants using compact daytime stress/heart-rate/body-battery context.
- Typical test performance range: balanced accuracy ~0.64, ROC-AUC ~0.64-0.68, PR-AUC ~0.35-0.40.
- Statistical validation supports key directional findings (for example,
daytime awake stress -> lower next-night recovery).
Start here for the portfolio narrative, then use the links below for technical depth:
- Case study - recruiter-friendly project narrative and key findings.
- Relationships notebook - directional
D -> D+1relationships and artifact checks. - Distributions notebook - metric distributions and segmented behavior patterns.
- Overview - map of stages, outputs, and how to navigate the repository.
- Pipeline - end-to-end flow from raw exports to analysis artifacts.
- EDA guide - notebook purpose, structure, and interpretation scope.
- Stage 0 - discovery, ingestion, and parquet build details.
- Stage 1 - sanitize, data dictionary, and quality labeling.
- Stage 2 - EDA workflow and promoted observational findings.
- Stage 3 - predictive modeling and lightweight statistical validation.
- SQL layer - DuckDB mart, SQL query pack, and PostgreSQL showcase.
- CLI - command reference, flags, outputs, and run order.
- Privacy - guardrails for local-only data and safe publishing boundaries.
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
python -m pip install -e .Primary CLI mode:
garmin-analytics discover
garmin-analytics ingest-uds
garmin-analytics ingest-sleep
garmin-analytics build-daily
garmin-analytics sanitize
garmin-analytics qualityOptional SQL layer:
garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolioOpen notebooks:
jupyter labIf you do not have private Garmin exports, you can still exercise the public Stage 1 workflow on a tiny committed sample:
PYTHONPATH=src .venv/bin/python scripts/setup_public_demo.py
garmin-analytics data-dictionary --markdown-mode both
garmin-analytics quality
garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolioDetails: Public demo
DuckDB (primary local analytics mart):
garmin-analytics build-sql-mart
garmin-analytics run-sql-portfolioPostgreSQL (compact production-like mirror):
- setup + runbook: examples/postgres_showcase/README.md
- schema/views/queries:
examples/postgres_showcase/ - SQL skills demonstrated: CTEs, window functions, day-to-next-day (
D -> D+1) alignment, and view-based analytics contracts.
Raw Garmin exports stay local and must never be committed. Sanitized outputs are the default analysis and sharing boundary. See docs/privacy.md.


