Discogs Dashboarding Documentation

Overview

Discogs provides a built-in web interface for browsing personal collections and wantlists; however, its analytical and customaization capabilities are limited. While basic metrics are available, users have little control over how data is transformed, combined, or reused across projects.

This repository focuses on building a reproducible Python-based ingestion pipeline that extracts data from the Discogs API, normalizes the nested responses, and persists the results in a relational SQL Server database. The primary goal is to centralize Discogs data in a structured, queryable format that can serve as a reliable foundation for downstream analysis, reporting, or experimentation.

Rather than trying the worflow to a single analytics tool, this project treats SQL Server as the system of record, enabling the data to be reused across multiple contexts (e.g., ad hoc SQL analysis, dashboards, notebooks, etc.)

Architecture Overview

The data pipeline follows a simple and intentional design:

Discogs API → Python → SQL Server

Each layer has a clearly defined responsibility:

The Discogs API provides raw, user-scoped metadata
Python handles extraction, transformation, and normalization
SQL Server stores curated, analytics-ready tables

While downstream tools such as Power BI can typically connect to the Discogs API, doing so is not considered best practice - especially when dashboards are shared or published.

Components

1. Discogs API

Discogs provides authenticated API access that allows users to programmatically retrieve data related to their collection, wantlist, artists, releases, and related metadata. This API serves as the raw data source for the pipeline and supports automated, repeatable extraction.

2. Python (Extraction & Transformation)

Python is used to:

Authenticate and interact with the Discogs API
Handle pagination and rate limits
Normalize nested JSON responses into tabular structures
Perform initial data cleaning and transformations prior to storage

Note: All transformation logic lives in Python.

3. SQL Server (Data Storage)

SQL Server is used as the persistent storage layer for transformed Discogs data.

Persisting the data in a relational database provides several benefits:

Logic is centralized and transparent
Nested API responses are converted into stable, relational tables
Data can be queried and reused independently; data models remain stable for downstream reporting

Configuration of Environment Variables (.env)

This project uses environment variables to manage API credentials and database connection details. Secrets are never committed to the repo.

The repository includes a template file located in src/:

src/.env.template - documents all required environment variables and serves as a starting point for local configuration.

Note: Once you have cloned the repo, ensure you add the .env.template file to the .gitignore file. Once this is done, the data you store here will remain locally contained (especially if working with git).

Setup:

Copy the template file
- cp .env.template
Populate the .env.template with:
- Your Discogs personal access token
- Your Discogs username
- SQL Server connection details
Do not commit .env.template:
- Add the .env.template file to the .gitignore file (at the repo root)
- This ensures credentials remain local and private

The ingestion script automatically loads configuration from the repository root and fails fast with a clear message if required variables are missing.

Getting Started

Prerequisites

Python 3.9+
SQL Server Express
Discogs API personal access token
A Discogs account (a lovingly curated vinyl collection helps)

Setup (High-Level)

Clone the repository
Create and activate a Python virtual environment
Install required Python dependencies
Configure Discogs API credentials via environment variables
Run ingestion scripts to populate the SQL database

Using the Data for Downstream Projects

Once ingested, the Discogs data stored in SQL can be reused for a wide range of individual projects, including:

Ad hoc SQL analysis
Exploratory notebooks
Custom applications
Dashboarding (e.g., Power BI)

Example - Connecting from Power BI

If you choose to use Power BI for dashboarding efforts:

Open Power BI Desktop
Select Get Data → SQL Server
Enter your SQL Server instance and database
Import tables directly or connect to SQL views
Refresh data after rerunning the ingestion script

Because Power BI connects to SQL rather than the API, dashboards remain decoupled from extraction logic and benefit from a stable, well-defined schema. This setup also enables refreshes to be automated or scheduled as needed, allowing reports to stay up-to-date as the ingestion pipeline is rerun. For more information on this automated refresh schedules, check out this article by Microsoft: Data Refresh in Power BI.

Data Scope & Assumptions

This pipeline is designed for personal Discogs accounts
Discogs metadata is user-generated and may contain inconsistencies
Data reflects the state of the collection at the time of extraction
All data ingested belongs solely to the authenticated Discogs user

Additional Context Re: Data Storage

For many personal or small-scale use cases, storing Discogs data in a relational database may seem unnecessary, as the data is largely non-sensitive and could be accessed directly from the Discogs API. This is a fair observation.

However, this project intentionally models the data in SQL Server to reflect best practices in analytics and data engineering, even when working with relatively simple or low-risk datasets. Persisting transformed data provides clear advantages in terms of transparency, reproducibility, and governance, and mirrors how similar pipelines are implemented in production environments.

Users who prefer a lighter-weight workflow may choose to query the Discogs API directly from tools such as Power Query. While suitable for quick exploration, this approach shifts transformation logic into the visualization layer and reduces reuse.

This repository is designed to demonstrate a robust, extensible ingestion pattern that prioritizes clean data modeling while still allowing flexibility for alternative downstream workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
sql		sql
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Discogs Dashboarding Documentation

Overview

Architecture Overview

Components

1. Discogs API

2. Python (Extraction & Transformation)

3. SQL Server (Data Storage)

Configuration of Environment Variables (.env)

Getting Started

Prerequisites

Setup (High-Level)

Using the Data for Downstream Projects

Example - Connecting from Power BI

Data Scope & Assumptions

Additional Context Re: Data Storage

About

Uh oh!

Releases

Packages

Languages

mackinlayj/discogs-sql-ingestion

Folders and files

Latest commit

History

Repository files navigation

Discogs Dashboarding Documentation

Overview

Architecture Overview

Components

1. Discogs API

2. Python (Extraction & Transformation)

3. SQL Server (Data Storage)

Configuration of Environment Variables (.env)

Getting Started

Prerequisites

Setup (High-Level)

Using the Data for Downstream Projects

Example - Connecting from Power BI

Data Scope & Assumptions

Additional Context Re: Data Storage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages