A product catalog where JSON columns hold dynamic attributes, exposed by virtual/persistent columns for indexing and fast filters — giving the agility of semi-structured data with the performance of relational indexes.
Problem: e-commerce and marketplace catalogs have many product-specific attributes (RAM, GPU, screen_size, battery_mAh, etc.) which vary by category. Storing every attribute as a column is inflexible; storing a blob of JSON is flexible but usually performs poorly for filters unless indexed.
Solution: Keep a JSON column (specs) for flexible attributes. Expose a selected set of frequently-filtered paths as virtual (generated) columns and add indexes on them (BTREE) or persistent indexes to speed up range/eq searches. Keep JSON for everything else.
This repository is a full example: database schema (schema.sql), a data generator to produce 300k rows (load_data.py), a small Express API (/api) that prefers generated columns for filters, and a Docker Compose setup to run everything locally.
This project solves real-world problems:
- Handling dynamic product attributes without rigid schema changes.
- Enabling fast filtering and sorting on frequently queried fields.
- Scaling data ingestion for large datasets (300k+ rows).
- Maintaining data integrity and validation even with semi-structured JSON.
- Providing an API that safely exposes complex queries while preventing injection or malformed requests.
MariaDB / MySQL 8.0+ (JSON support + generated columns)
Node.js + Express (API with validation + metrics)
Docker Compose (for reproducible setup)
Python (Faker) for synthetic data generation
Hybrid database design where:
- Product attributes (RAM, GPU, screen size, battery, etc.) are stored as JSON.
- Frequently queried attributes are extracted as generated columns for B-tree indexing.
- Full API layer enables complex filters (e.g.,
ram >= 16,gpu = "RTX 4070"). - Supports high-volume data ingestion efficiently via SQL inserts or CSV +
LOAD DATA INFILE. - Provides built-in analytics views and constraints for robust data validation.
A production-ready, containerized flexible product catalog that stores arbitrary product attributes in a specs JSON column while providing fast, indexed SQL access via either MySQL 8 (original) or MariaDB (compatibility option). The repo contains both MySQL and MariaDB flavors — pick the one that suits your environment.
flexible-catalog/
├── docker-compose.yml # MySQL compose
├── docker-compose-mariadb.yml # MariaDB compose (optional)
├── .env
├── schema.sql # MySQL schema (JSON + generated columns)
├── schema-mariadb-compat.sql # MariaDB-compatible schema (physical columns)
├── load_data.py # MySQL loader (for generated columns)
├── load_data_mariadb.py # MariaDB loader (populates physical columns)
├── products_inserts.sql # generated by load_data.py
├── products_inserts_mariadb.sql # generated by load_data_mariadb.py
├── products.csv # generated CSVs
├── api/
│ ├── Dockerfile
│ ├── package.json
│ ├── server.js
│ └── middleware/
│ └── validateFilter.js
└── README.md
- Two supported DB flows: MySQL 8 (uses JSON + generated stored columns) and MariaDB (uses physical indexed columns populated by loader).
- Files provided:
docker-compose.yml(MySQL),schema.sql(MySQL DDL),load_data.py(MySQL loader), anddocker-compose-mariadb.yml,schema-mariadb-compat.sql,load_data_mariadb.py(MariaDB-compatible). - API: Node.js (Express) in
api/with endpoints for search, pagination, product detail, and analytics.
- Docker 20.10+ & Docker Compose 1.29+ (container orchestration)
- MariaDB 10.6+ or MySQL 8.0+ (JSON support, generated columns, CHECK constraints)
- Node.js v18+ (API server)
- Python 3.9+ (data generator with
fakerlibrary) - NPM / Yarn (API dependencies)
Optional:
- Postman or curl for testing API endpoints
- Prometheus / Grafana for monitoring
git clone https://github.com/your-username/flexible-catalog.git
cd flexible-catalogCreate a .env file in the repo root (example):
DB_HOST=db
DB_PORT=3306
DB_USER=catalog_user
DB_PASS=SecurePass123!
DB_NAME=catalog
PORT=3000Both compose files accept these environment variables. The MariaDB compose also sets
MARIADB_*envs internally for compatibility.
This is the original flow in the repo and uses MySQL 8 features (JSON column + generated stored columns). Use this if you want the simplest parity with the original schema and performance from MySQL JSON + generated columns.
docker-compose.yml(usesmysql:8.0)schema.sql(DDL with JSON + generated columns)load_data.py(generatesproducts_inserts.sqlandproducts.csv)api/(Node API)
- Ensure Docker is running.
- Start the stack:
docker-compose up -d- Generate demo data locally (example will create 300k by default unless you override):
python load_data.py
# or smaller quick run
NUM_PRODUCTS=50000 python load_data.py- After the DB container finishes initialization (watch logs), load the generated SQL into MySQL:
docker exec -i catalog_db mysql -uroot -p$DB_PASS $DB_NAME < products_inserts.sql- Confirm API health:
curl http://localhost:3000/healthRun these queries to confirm data & performance:
# Basic list
curl http://localhost:3000/products
# Search: laptops with >=32GB RAM
curl -X POST http://localhost:3000/products/search -H "Content-Type: application/json" -d '{"filters":[{"field":"ram","op":">=","value":32},{"field":"category","op":"=","value":"laptop"}], "page":1, "pageSize":10}'
# Analytics summary
curl http://localhost:3000/analytics/summaryTo compare JSON-path vs virtual column performance (optional):
-- Run in MySQL client inside container
SELECT * FROM products WHERE JSON_EXTRACT(specs, '$.ram') >= 32; -- JSON-path (no index)
SELECT * FROM products WHERE ram_gb >= 32; -- generated column (indexed)Measure response times in your client; indexed queries should be much faster.
This option is included for environments that prefer MariaDB. The repo provides schema-mariadb-compat.sql and load_data_mariadb.py. The loader writes both specs (JSON) and physical indexed columns (ram_gb, storage_gb, brand, etc.) so you avoid relying on DB-specific generated columns.
docker-compose-mariadb.yml(usesmariadb:10.11)schema-mariadb-compat.sql(DDL creating physical indexed columns)load_data_mariadb.py(generatesproducts_inserts_mariadb.sqlandproducts_mariadb.csv)api/(Node API — unchanged)
- Ensure Docker is running.
- Start MariaDB via the provided compose file:
# Option 1 — run directly with file name
docker-compose -f docker-compose-mariadb.yml up -d
# Option 2 — rename/move file to docker-compose.yml and run
mv docker-compose-mariadb.yml docker-compose.yml
docker-compose up -d- Generate demo data (example):
# default smaller run for quick test
NUM_PRODUCTS=50000 python load_data_mariadb.py
# full run
NUM_PRODUCTS=300000 python load_data_mariadb.py- Load SQL into MariaDB (after DB initialization):
docker exec -i catalog_db mysql -uroot -p$DB_PASS catalog < products_inserts_mariadb.sql- Confirm API health:
curl http://localhost:3000/healthUse the same API calls as MySQL to verify responses:
# Health
curl http://localhost:3000/health
# Search: laptops with >=32GB RAM
curl -X POST http://localhost:3000/products/search -H "Content-Type: application/json" -d '{"filters":[{"field":"ram","op":">=","value":32},{"field":"category","op":"=","value":"laptop"}], "page":1, "pageSize":10}'
# Analytics
curl http://localhost:3000/analytics/summaryBecause physical columns are indexed, performance of hot queries should be similar to MySQL generated-column approach. If you see mismatches in column values, ensure you used the load_data_mariadb.py loader so those columns are populated.
- The API (
api/server.js) reads columns likeram_gb,brand, etc. If you run MySQL with generated stored columns, those appear automatically. For MariaDB mode, ensure your loader populated these physical columns. - To switch DB engines: stop the current compose (
docker-compose down), start the desired compose file, and reload the data with the appropriate loader's SQL file.
- DB init errors: If
/docker-entrypoint-initdb.d/01-schema.sqlfails, check container logs:docker logs catalog_dband inspect the SQL error. Common causes: function not supported in older DB versions, syntax differences, or file encoding problems. - Slow inserts: Use
LOAD DATA INFILEfor the CSV files rather than many single inserts when possible. For MySQL,products.csv+LOAD DATA INFILEis fastest. - Connection refused: ensure
.envDB_HOSTisdb(the compose service name) and containers are on the same network. - Data mismatch: If
ram_gbshowsNULL, you likely loaded data with the wrong loader — rerun the correct loader (load_data.pyvsload_data_mariadb.py).
Create verify.sh for quick smoke testing (adjust host/port as needed):
#!/usr/bin/env bash
set -e
echo "Health check:"
curl -s http://localhost:3000/health | jq
echo "Get products sample:"
curl -s http://localhost:3000/products | jq '.pagination, .data | .[0:2]'
echo "Search laptops with >=32GB RAM"
curl -s -X POST http://localhost:3000/products/search -H "Content-Type: application/json" -d '{"filters":[{"field":"ram","op":">=","value":32},{"field":"category","op":"=","value":"laptop"}], "page":1, "pageSize":5}' | jq
echo "Analytics summary"
curl -s http://localhost:3000/analytics/summary | jq- Default (recommended for parity):
docker-compose.yml,schema.sql,load_data.py - If you prefer MariaDB:
docker-compose-mariadb.yml,schema-mariadb-compat.sql,load_data_mariadb.py
- JSON-only query:
SELECT * FROM products WHERE JSON_EXTRACT(specs, '$.ram') >= 16;- Virtual-column indexed query (fast):
SELECT * FROM products WHERE ram_gb >= 16;Virtual columns: 3–10× faster on 300k+ rows.
- Constraints: JSON validation, numeric fields, positive price.
- Triggers: normalize JSON on insert, auto-update timestamps.
- Views:
product_summaryfor analytics. - Middleware: validates filters before hitting DB.
- Logging: query execution times via
morgan. - Healthchecks: Docker ensures service recovery.
- Scalable Data Ingestion: CSV bulk load for large datasets.
Problems Solved:
- Flexible schema for dynamic attributes.
- Fast search/filtering on common fields.
- Data consistency in semi-structured datasets.
- Safe API interface preventing invalid queries.
- CSV Import Error → check MySQL
secure_file_priv. Ensure file path is correct and accessible. - Docker container restarts → check
.envDB settings; verify ports and credentials. - Slow inserts → prefer CSV import over SQL inserts.
- API returns errors → verify request payload and filters; check server logs.
- Database connection issues → ensure DB container is running and accessible; retry connection.
- Schema load fails → check MySQL version compatibility (8.0+ or MariaDB 10.6+).
For persistent issues, consult logs in Docker containers (docker logs <container>) and ensure all dependencies meet version requirements.
- Full-text search for JSON specs.
- GraphQL API support.
- Compare PostgreSQL JSONB vs MySQL JSON for performance.
- Prometheus/Grafana for monitoring.
🔥 Demonstrates real-world DB optimization, scalable ingestion, API best practices, solving semi-structured data challenges, and includes guidance for handling failures.