Juniper Data

Dataset generation and management service for the Juniper ecosystem.

Overview

Juniper Data provides a centralized service for generating, storing, and serving datasets used by the Juniper neural network projects. It supports various dataset types, including the classic two-spiral classification problem.

Ecosystem Compatibility

This service is part of the Juniper ecosystem. Verified compatible versions:

juniper-data	juniper-cascor	juniper-canopy	data-client	cascor-client	cascor-worker
0.4.x	0.3.x	0.2.x	>=0.3.1	>=0.1.0	>=0.1.0

For full-stack Docker deployment and integration tests, see juniper-deploy.

Architecture

juniper-data is the foundational data layer of the Juniper Project ecosystem. juniper-cascor and juniper-canopy both call juniper-data to generate and retrieve datasets.

┌─────────────────────┐     REST+WS      ┌──────────────────────┐
│   juniper-canopy    │ ◄──────────────► │    juniper-cascor    │
│   Dashboard         │                  │    Training Svc      │
│   Port 8050         │                  │    Port 8200         │
└──────────┬──────────┘                  └──────────┬───────────┘
           │ REST                                   │ REST
           ▼                                        ▼
┌──────────────────────────────────────────────────────────────┐
│                      JuniperData  ◄── (this service)         │
│                   Dataset Service  ·  Port 8100              │
└──────────────────────────────────────────────────────────────┘

Data contract: datasets are served as NPZ archives with keys X_train, y_train, X_test, y_test, X_full, y_full (all float32).

Related Services

Service	Relationship	Environment Variable
juniper-cascor	Consumes JuniperData for training datasets	`JUNIPER_DATA_URL=http://localhost:8100`
juniper-canopy	Consumes JuniperData for visualization data	`JUNIPER_DATA_URL=http://localhost:8100`
juniper-data-client	PyPI client library for this service	`pip install juniper-data-client`

Service Configuration

Variable	Default	Description
`JUNIPER_DATA_HOST`	`0.0.0.0`	Listen address
`JUNIPER_DATA_PORT`	`8100`	Service port
`JUNIPER_DATA_LOG_LEVEL`	`INFO`	Log verbosity

Docker Deployment

# Full stack with all three services:
git clone https://github.com/pcalnon/juniper-deploy.git  # (private repository)
cd juniper-deploy && docker compose up --build

Dependency Lockfile

The requirements.lock file pins exact dependency versions for reproducible Docker builds. The pyproject.toml retains flexible >= ranges for local development.

Regenerate after changing dependencies in pyproject.toml:

uv pip compile pyproject.toml --extra api --extra observability -o requirements.lock

Installation

Basic Installation

pip install -e .

With API Support

pip install -e ".[api]"

Development Installation

pip install -e ".[dev]"

Full Installation

pip install -e ".[all]"

Quick Start

Generate a Spiral Dataset

from juniper_data.generators.spiral import SpiralGenerator

generator = SpiralGenerator()
dataset = generator.generate(n_points=100, n_spirals=2, noise=0.1)

Start the API Server

uvicorn juniper_data.api.app:app --reload

API Endpoints

Endpoint	Method	Description
`/v1/health`	GET	Health check
`/v1/health/live`	GET	Liveness probe
`/v1/health/ready`	GET	Readiness probe (checks storage)
`/v1/generators`	GET	List all generators with schemas
`/v1/generators/{name}/schema`	GET	Get parameter schema for a generator
`/v1/datasets`	POST	Create dataset (or return cached dataset)
`/v1/datasets`	GET	List dataset IDs
`/v1/datasets/filter`	GET	Filter metadata by generator/tags/date/name/version
`/v1/datasets/stats`	GET	Aggregate dataset statistics
`/v1/datasets/versions`	GET	List all versions for a logical dataset name
`/v1/datasets/latest`	GET	Get latest version for a logical dataset name
`/v1/datasets/batch-create`	POST	Create multiple datasets
`/v1/datasets/batch-delete`	POST	Delete multiple datasets
`/v1/datasets/batch-tags`	PATCH	Update tags on multiple datasets
`/v1/datasets/batch-export`	POST	Export multiple datasets as ZIP
`/v1/datasets/cleanup-expired`	POST	Delete expired datasets
`/v1/datasets/{id}`	GET	Get dataset metadata
`/v1/datasets/{id}`	DELETE	Delete a dataset
`/v1/datasets/{id}/artifact`	GET	Download NPZ artifact
`/v1/datasets/{id}/preview`	GET	Preview first N samples as JSON
`/v1/datasets/{id}/tags`	PATCH	Add/remove tags on one dataset

See docs/api/JUNIPER_DATA_API.md for full endpoint documentation, including filtering, batch operations, and tagging.

Named Dataset Versioning

POST /v1/datasets supports logical names for versioned datasets:

Set name to group related datasets into a version series.
Persisted creates with the same name, auto-increment meta.dataset_version (1, 2, 3, ...).
Repeating an identical request returns the cached dataset and keeps its existing version.
Use GET /v1/datasets/versions?name=<dataset_name> to view history and GET /v1/datasets/latest?name=<dataset_name> to resolve the latest.

Project Structure

juniper-data/
├── juniper_data/
│   ├── core/           # Core functionality and base classes
│   ├── generators/     # Dataset generators (8 types)
│   │   ├── spiral/     # Multi-spiral classification
│   │   ├── xor/        # XOR classification
│   │   ├── gaussian/   # Mixture of Gaussians
│   │   ├── circles/    # Concentric circles
│   │   ├── checkerboard/ # 2D checkerboard pattern
│   │   ├── csv_import/ # CSV/JSON file import
│   │   ├── mnist/      # MNIST / Fashion-MNIST
│   │   └── arc_agi/    # ARC-AGI visual reasoning
│   ├── storage/        # Dataset persistence layer
│   ├── api/            # FastAPI application
│   │   └── routes/     # API route handlers
│   └── tests/          # Test suite
│       ├── unit/       # Unit tests
│       └── integration/ # Integration tests
├── pyproject.toml      # Project configuration
└── README.md           # This file

Development

Running Tests

pytest

Running Tests with Coverage

pytest --cov=juniper_data --cov-report=html

Code Formatting

ruff format juniper_data tests
ruff check --fix juniper_data tests

Type Checking

mypy juniper_data

Juniper Ecosystem

Repository	Description
juniper-data	Dataset generation service (this repo)
juniper-cascor	CasCor neural network training service
juniper-canopy	Real-time monitoring dashboard
juniper-data-client	PyPI: `juniper-data-client`
juniper-cascor-client	PyPI: `juniper-cascor-client`
juniper-cascor-worker	PyPI: `juniper-cascor-worker`

Name		Name	Last commit message	Last commit date
Latest commit History 830 Commits
.github		.github
.serena		.serena
conf		conf
docs		docs
juniper_data		juniper_data
notes		notes
scripts		scripts
util		util
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.sops.yaml		.sops.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
markdown.css		markdown.css
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock
src		src
tests		tests
try		try

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Juniper Data

Overview

Ecosystem Compatibility

Architecture

Related Services

Service Configuration

Docker Deployment

Dependency Lockfile

Installation

Basic Installation

With API Support

Development Installation

Full Installation

Quick Start

Generate a Spiral Dataset

Start the API Server

API Endpoints

Named Dataset Versioning

Project Structure

Development

Running Tests

Running Tests with Coverage

Code Formatting

Type Checking

Juniper Ecosystem

License

Git Leaks

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Juniper Data

Overview

Ecosystem Compatibility

Architecture

Related Services

Service Configuration

Docker Deployment

Dependency Lockfile

Installation

Basic Installation

With API Support

Development Installation

Full Installation

Quick Start

Generate a Spiral Dataset

Start the API Server

API Endpoints

Named Dataset Versioning

Project Structure

Development

Running Tests

Running Tests with Coverage

Code Formatting

Type Checking

Juniper Ecosystem

License

Git Leaks

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages