Dataset generation and management service for the Juniper ecosystem.
Juniper Data provides a centralized service for generating, storing, and serving datasets used by the Juniper neural network projects. It supports various dataset types, including the classic two-spiral classification problem.
This service is part of the Juniper ecosystem. Verified compatible versions:
| juniper-data | juniper-cascor | juniper-canopy | data-client | cascor-client | cascor-worker |
|---|---|---|---|---|---|
| 0.4.x | 0.3.x | 0.2.x | >=0.3.1 | >=0.1.0 | >=0.1.0 |
For full-stack Docker deployment and integration tests, see juniper-deploy.
juniper-data is the foundational data layer of the Juniper Project ecosystem. juniper-cascor and juniper-canopy both call juniper-data to generate and retrieve datasets.
βββββββββββββββββββββββ REST+WS ββββββββββββββββββββββββ
β juniper-canopy β ββββββββββββββββΊ β juniper-cascor β
β Dashboard β β Training Svc β
β Port 8050 β β Port 8200 β
ββββββββββββ¬βββββββββββ ββββββββββββ¬ββββββββββββ
β REST β REST
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JuniperData βββ (this service) β
β Dataset Service Β· Port 8100 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data contract: datasets are served as NPZ archives with keys X_train, y_train, X_test, y_test, X_full, y_full (all float32).
| Service | Relationship | Environment Variable |
|---|---|---|
| juniper-cascor | Consumes JuniperData for training datasets | JUNIPER_DATA_URL=http://localhost:8100 |
| juniper-canopy | Consumes JuniperData for visualization data | JUNIPER_DATA_URL=http://localhost:8100 |
| juniper-data-client | PyPI client library for this service | pip install juniper-data-client |
| Variable | Default | Description |
|---|---|---|
JUNIPER_DATA_HOST |
0.0.0.0 |
Listen address |
JUNIPER_DATA_PORT |
8100 |
Service port |
JUNIPER_DATA_LOG_LEVEL |
INFO |
Log verbosity |
# Full stack with all three services:
git clone https://github.com/pcalnon/juniper-deploy.git # (private repository)
cd juniper-deploy && docker compose up --buildThe requirements.lock file pins exact dependency versions for reproducible Docker builds. The pyproject.toml retains flexible >= ranges for local development.
Regenerate after changing dependencies in pyproject.toml:
uv pip compile pyproject.toml --extra api --extra observability -o requirements.lockpip install -e .pip install -e ".[api]"pip install -e ".[dev]"pip install -e ".[all]"from juniper_data.generators.spiral import SpiralGenerator
generator = SpiralGenerator()
dataset = generator.generate(n_points=100, n_spirals=2, noise=0.1)uvicorn juniper_data.api.app:app --reload| Endpoint | Method | Description |
|---|---|---|
/v1/health |
GET | Health check |
/v1/health/live |
GET | Liveness probe |
/v1/health/ready |
GET | Readiness probe (checks storage) |
/v1/generators |
GET | List all generators with schemas |
/v1/generators/{name}/schema |
GET | Get parameter schema for a generator |
/v1/datasets |
POST | Create dataset (or return cached dataset) |
/v1/datasets |
GET | List dataset IDs |
/v1/datasets/filter |
GET | Filter metadata by generator/tags/date/name/version |
/v1/datasets/stats |
GET | Aggregate dataset statistics |
/v1/datasets/versions |
GET | List all versions for a logical dataset name |
/v1/datasets/latest |
GET | Get latest version for a logical dataset name |
/v1/datasets/batch-create |
POST | Create multiple datasets |
/v1/datasets/batch-delete |
POST | Delete multiple datasets |
/v1/datasets/batch-tags |
PATCH | Update tags on multiple datasets |
/v1/datasets/batch-export |
POST | Export multiple datasets as ZIP |
/v1/datasets/cleanup-expired |
POST | Delete expired datasets |
/v1/datasets/{id} |
GET | Get dataset metadata |
/v1/datasets/{id} |
DELETE | Delete a dataset |
/v1/datasets/{id}/artifact |
GET | Download NPZ artifact |
/v1/datasets/{id}/preview |
GET | Preview first N samples as JSON |
/v1/datasets/{id}/tags |
PATCH | Add/remove tags on one dataset |
See docs/api/JUNIPER_DATA_API.md for full endpoint documentation, including filtering, batch operations, and tagging.
POST /v1/datasets supports logical names for versioned datasets:
- Set
nameto group related datasets into a version series. - Persisted creates with the same
name, auto-incrementmeta.dataset_version(1,2,3, ...). - Repeating an identical request returns the cached dataset and keeps its existing version.
- Use
GET /v1/datasets/versions?name=<dataset_name>to view history andGET /v1/datasets/latest?name=<dataset_name>to resolve the latest.
juniper-data/
βββ juniper_data/
β βββ core/ # Core functionality and base classes
β βββ generators/ # Dataset generators (8 types)
β β βββ spiral/ # Multi-spiral classification
β β βββ xor/ # XOR classification
β β βββ gaussian/ # Mixture of Gaussians
β β βββ circles/ # Concentric circles
β β βββ checkerboard/ # 2D checkerboard pattern
β β βββ csv_import/ # CSV/JSON file import
β β βββ mnist/ # MNIST / Fashion-MNIST
β β βββ arc_agi/ # ARC-AGI visual reasoning
β βββ storage/ # Dataset persistence layer
β βββ api/ # FastAPI application
β β βββ routes/ # API route handlers
β βββ tests/ # Test suite
β βββ unit/ # Unit tests
β βββ integration/ # Integration tests
βββ pyproject.toml # Project configuration
βββ README.md # This filepytestpytest --cov=juniper_data --cov-report=htmlruff format juniper_data tests
ruff check --fix juniper_data testsmypy juniper_data| Repository | Description |
|---|---|
| juniper-data | Dataset generation service (this repo) |
| juniper-cascor | CasCor neural network training service |
| juniper-canopy | Real-time monitoring dashboard |
| juniper-data-client | PyPI: juniper-data-client |
| juniper-cascor-client | PyPI: juniper-cascor-client |
| juniper-cascor-worker | PyPI: juniper-cascor-worker |
MIT License - Copyright (c) 2024-2026 Paul Calnon