Flexible Catalog – JSON + Virtual Columns 🚀

Runner Up at MariaDB Python Hackathon 2025

A product catalog where JSON columns hold dynamic attributes, exposed by virtual/persistent columns for indexing and fast filters — giving the agility of semi-structured data with the performance of relational indexes.

Problem: e-commerce and marketplace catalogs have many product-specific attributes (RAM, GPU, screen_size, battery_mAh, etc.) which vary by category. Storing every attribute as a column is inflexible; storing a blob of JSON is flexible but usually performs poorly for filters unless indexed.

Solution: Keep a JSON column (specs) for flexible attributes. Expose a selected set of frequently-filtered paths as virtual (generated) columns and add indexes on them (BTREE) or persistent indexes to speed up range/eq searches. Keep JSON for everything else.

This repository is a full example: database schema (schema.sql), a data generator to produce 300k rows (load_data.py), a small Express API (/api) that prefers generated columns for filters, and a Docker Compose setup to run everything locally.

This project solves real-world problems:

Handling dynamic product attributes without rigid schema changes.
Enabling fast filtering and sorting on frequently queried fields.
Scaling data ingestion for large datasets (300k+ rows).
Maintaining data integrity and validation even with semi-structured JSON.
Providing an API that safely exposes complex queries while preventing injection or malformed requests.

Built with:

MariaDB / MySQL 8.0+ (JSON support + generated columns)

Node.js + Express (API with validation + metrics)

Docker Compose (for reproducible setup)

Python (Faker) for synthetic data generation

🌟 Project Overview

Hybrid database design where:

Product attributes (RAM, GPU, screen size, battery, etc.) are stored as JSON.
Frequently queried attributes are extracted as generated columns for B-tree indexing.
Full API layer enables complex filters (e.g., ram >= 16, gpu = "RTX 4070").
Supports high-volume data ingestion efficiently via SQL inserts or CSV + LOAD DATA INFILE.
Provides built-in analytics views and constraints for robust data validation.

A production-ready, containerized flexible product catalog that stores arbitrary product attributes in a specs JSON column while providing fast, indexed SQL access via either MySQL 8 (original) or MariaDB (compatibility option). The repo contains both MySQL and MariaDB flavors — pick the one that suits your environment.

📁 Repo structure

flexible-catalog/
├── docker-compose.yml                    # MySQL compose
├── docker-compose-mariadb.yml            # MariaDB compose (optional)
├── .env
├── schema.sql                            # MySQL schema (JSON + generated columns)
├── schema-mariadb-compat.sql             # MariaDB-compatible schema (physical columns)
├── load_data.py                          # MySQL loader (for generated columns)
├── load_data_mariadb.py                  # MariaDB loader (populates physical columns)
├── products_inserts.sql                  # generated by load_data.py
├── products_inserts_mariadb.sql          # generated by load_data_mariadb.py
├── products.csv                          # generated CSVs
├── api/
│   ├── Dockerfile
│   ├── package.json
│   ├── server.js
│   └── middleware/
│       └── validateFilter.js
└── README.md

🎯 Quick summary

Two supported DB flows: MySQL 8 (uses JSON + generated stored columns) and MariaDB (uses physical indexed columns populated by loader).
Files provided: docker-compose.yml (MySQL), schema.sql (MySQL DDL), load_data.py (MySQL loader), and docker-compose-mariadb.yml, schema-mariadb-compat.sql, load_data_mariadb.py (MariaDB-compatible).
API: Node.js (Express) in api/ with endpoints for search, pagination, product detail, and analytics.

⚙️ Requirements & Versions

Docker 20.10+ & Docker Compose 1.29+ (container orchestration)
MariaDB 10.6+ or MySQL 8.0+ (JSON support, generated columns, CHECK constraints)
Node.js v18+ (API server)
Python 3.9+ (data generator with faker library)
NPM / Yarn (API dependencies)

Optional:

Postman or curl for testing API endpoints
Prometheus / Grafana for monitoring

🚀 Getting Started

1. Clone Repository

git clone https://github.com/your-username/flexible-catalog.git
cd flexible-catalog

2. ⚙️Configure Environment variables

Create a .env file in the repo root (example):

DB_HOST=db
DB_PORT=3306
DB_USER=catalog_user
DB_PASS=SecurePass123!
DB_NAME=catalog
PORT=3000

Both compose files accept these environment variables. The MariaDB compose also sets MARIADB_* envs internally for compatibility.

🧭 Option A — Run with MySQL 8 (default)

This is the original flow in the repo and uses MySQL 8 features (JSON column + generated stored columns). Use this if you want the simplest parity with the original schema and performance from MySQL JSON + generated columns.

Files used

docker-compose.yml (uses mysql:8.0)
schema.sql (DDL with JSON + generated columns)
load_data.py (generates products_inserts.sql and products.csv)
api/ (Node API)

Steps

Ensure Docker is running.
Start the stack:

docker-compose up -d

Generate demo data locally (example will create 300k by default unless you override):

python load_data.py
# or smaller quick run
NUM_PRODUCTS=50000 python load_data.py

After the DB container finishes initialization (watch logs), load the generated SQL into MySQL:

docker exec -i catalog_db mysql -uroot -p$DB_PASS $DB_NAME < products_inserts.sql

Confirm API health:

curl http://localhost:3000/health

Verification & tests

Run these queries to confirm data & performance:

# Basic list
curl http://localhost:3000/products

# Search: laptops with >=32GB RAM
curl -X POST http://localhost:3000/products/search -H "Content-Type: application/json" -d '{"filters":[{"field":"ram","op":">=","value":32},{"field":"category","op":"=","value":"laptop"}], "page":1, "pageSize":10}'

# Analytics summary
curl http://localhost:3000/analytics/summary

To compare JSON-path vs virtual column performance (optional):

-- Run in MySQL client inside container
SELECT * FROM products WHERE JSON_EXTRACT(specs, '$.ram') >= 32;    -- JSON-path (no index)
SELECT * FROM products WHERE ram_gb >= 32;                          -- generated column (indexed)

Measure response times in your client; indexed queries should be much faster.

🧩 Option B — Run with MariaDB (compatibility mode)

This option is included for environments that prefer MariaDB. The repo provides schema-mariadb-compat.sql and load_data_mariadb.py. The loader writes both specs (JSON) and physical indexed columns (ram_gb, storage_gb, brand, etc.) so you avoid relying on DB-specific generated columns.

Files used

docker-compose-mariadb.yml (uses mariadb:10.11)
schema-mariadb-compat.sql (DDL creating physical indexed columns)
load_data_mariadb.py (generates products_inserts_mariadb.sql and products_mariadb.csv)
api/ (Node API — unchanged)

Steps

Ensure Docker is running.
Start MariaDB via the provided compose file:

# Option 1 — run directly with file name
docker-compose -f docker-compose-mariadb.yml up -d

# Option 2 — rename/move file to docker-compose.yml and run
mv docker-compose-mariadb.yml docker-compose.yml
docker-compose up -d

Generate demo data (example):

# default smaller run for quick test
NUM_PRODUCTS=50000 python load_data_mariadb.py
# full run
NUM_PRODUCTS=300000 python load_data_mariadb.py

Load SQL into MariaDB (after DB initialization):

docker exec -i catalog_db mysql -uroot -p$DB_PASS catalog < products_inserts_mariadb.sql

Confirm API health:

curl http://localhost:3000/health

Verification & tests

Use the same API calls as MySQL to verify responses:

# Health
curl http://localhost:3000/health

# Search: laptops with >=32GB RAM
curl -X POST http://localhost:3000/products/search -H "Content-Type: application/json" -d '{"filters":[{"field":"ram","op":">=","value":32},{"field":"category","op":"=","value":"laptop"}], "page":1, "pageSize":10}'

# Analytics
curl http://localhost:3000/analytics/summary

Because physical columns are indexed, performance of hot queries should be similar to MySQL generated-column approach. If you see mismatches in column values, ensure you used the load_data_mariadb.py loader so those columns are populated.

🔁 Switching between MySQL & MariaDB

The API (api/server.js) reads columns like ram_gb, brand, etc. If you run MySQL with generated stored columns, those appear automatically. For MariaDB mode, ensure your loader populated these physical columns.
To switch DB engines: stop the current compose (docker-compose down), start the desired compose file, and reload the data with the appropriate loader's SQL file.

⚠️ Troubleshooting & tips

DB init errors: If /docker-entrypoint-initdb.d/01-schema.sql fails, check container logs: docker logs catalog_db and inspect the SQL error. Common causes: function not supported in older DB versions, syntax differences, or file encoding problems.
Slow inserts: Use LOAD DATA INFILE for the CSV files rather than many single inserts when possible. For MySQL, products.csv + LOAD DATA INFILE is fastest.
Connection refused: ensure .env DB_HOST is db (the compose service name) and containers are on the same network.
Data mismatch: If ram_gb shows NULL, you likely loaded data with the wrong loader — rerun the correct loader (load_data.py vs load_data_mariadb.py).

🧪 Suggested verification script (optional)

Create verify.sh for quick smoke testing (adjust host/port as needed):

#!/usr/bin/env bash
set -e

echo "Health check:"
curl -s http://localhost:3000/health | jq

echo "Get products sample:"
curl -s http://localhost:3000/products | jq '.pagination, .data | .[0:2]'

echo "Search laptops with >=32GB RAM"
curl -s -X POST http://localhost:3000/products/search -H "Content-Type: application/json" -d '{"filters":[{"field":"ram","op":">=","value":32},{"field":"category","op":"=","value":"laptop"}], "page":1, "pageSize":5}' | jq

echo "Analytics summary"
curl -s http://localhost:3000/analytics/summary | jq

📦 Which files to use (summary)

Default (recommended for parity): docker-compose.yml, schema.sql, load_data.py
If you prefer MariaDB: docker-compose-mariadb.yml, schema-mariadb-compat.sql, load_data_mariadb.py

📊 Benchmarking

JSON-only query:

SELECT * FROM products WHERE JSON_EXTRACT(specs, '$.ram') >= 16;

Virtual-column indexed query (fast):

SELECT * FROM products WHERE ram_gb >= 16;

Virtual columns: 3–10× faster on 300k+ rows.

🛠️ Advanced Features & Technical Details

Constraints: JSON validation, numeric fields, positive price.
Triggers: normalize JSON on insert, auto-update timestamps.
Views: product_summary for analytics.
Middleware: validates filters before hitting DB.
Logging: query execution times via morgan.
Healthchecks: Docker ensures service recovery.
Scalable Data Ingestion: CSV bulk load for large datasets.

Problems Solved:

Flexible schema for dynamic attributes.
Fast search/filtering on common fields.
Data consistency in semi-structured datasets.
Safe API interface preventing invalid queries.

🧪 Troubleshooting & Fallbacks

CSV Import Error → check MySQL secure_file_priv. Ensure file path is correct and accessible.
Docker container restarts → check .env DB settings; verify ports and credentials.
Slow inserts → prefer CSV import over SQL inserts.
API returns errors → verify request payload and filters; check server logs.
Database connection issues → ensure DB container is running and accessible; retry connection.
Schema load fails → check MySQL version compatibility (8.0+ or MariaDB 10.6+).

For persistent issues, consult logs in Docker containers (docker logs <container>) and ensure all dependencies meet version requirements.

📝 Next Steps / Stretch Goals

Full-text search for JSON specs.
GraphQL API support.
Compare PostgreSQL JSONB vs MySQL JSON for performance.
Prometheus/Grafana for monitoring.

🔥 Demonstrates real-world DB optimization, scalable ingestion, API best practices, solving semi-structured data challenges, and includes guidance for handling failures.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
flexible-catalog		flexible-catalog
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flexible Catalog – JSON + Virtual Columns 🚀

Runner Up at MariaDB Python Hackathon 2025

Built with:

🌟 Project Overview

📁 Repo structure

🎯 Quick summary

⚙️ Requirements & Versions

🚀 Getting Started

1. Clone Repository

2. ⚙️Configure Environment variables

🧭 Option A — Run with MySQL 8 (default)

Files used

Steps

Verification & tests

🧩 Option B — Run with MariaDB (compatibility mode)

Files used

Steps

Verification & tests

🔁 Switching between MySQL & MariaDB

⚠️ Troubleshooting & tips

🧪 Suggested verification script (optional)

📦 Which files to use (summary)

📊 Benchmarking

🛠️ Advanced Features & Technical Details

🧪 Troubleshooting & Fallbacks

📝 Next Steps / Stretch Goals

About

Uh oh!

Releases

Packages

Languages

License

divs-spec/JSON-Virtual-Columns-Catalog

Folders and files

Latest commit

History

Repository files navigation

Flexible Catalog – JSON + Virtual Columns 🚀

Runner Up at MariaDB Python Hackathon 2025

Built with:

🌟 Project Overview

📁 Repo structure

🎯 Quick summary

⚙️ Requirements & Versions

🚀 Getting Started

1. Clone Repository

2. ⚙️Configure Environment variables

🧭 Option A — Run with MySQL 8 (default)

Files used

Steps

Verification & tests

🧩 Option B — Run with MariaDB (compatibility mode)

Files used

Steps

Verification & tests

🔁 Switching between MySQL & MariaDB

⚠️ Troubleshooting & tips

🧪 Suggested verification script (optional)

📦 Which files to use (summary)

📊 Benchmarking

🛠️ Advanced Features & Technical Details

🧪 Troubleshooting & Fallbacks

📝 Next Steps / Stretch Goals

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages