ML Pipelines Template

📑 Table of Contents

📋 Overview
📦 Prerequisites
🚀 Getting Started
📁 Project Structure
🛠️ Development Commands
🏗️ Architecture Overview
➕ Adding New Pipelines
📚 Related Resources
🤝 Contributing
📄 License

📋 Overview

This is a production-ready template for building Kubeflow Pipelines (KFP) workflows with Python. It provides a structured, scalable architecture for ML pipelines with containerized task execution, type-safe configuration, and comprehensive testing.

✨ Key Features

🔄 Kubeflow Pipelines Integration: Build, compile, and deploy KFP workflows
🧩 Task-Based Architecture: Modular ML tasks (feature engineering, training, evaluation, inference, export)
🌍 Environment Management: Multi-environment support (dev, prod) with isolated configurations
⚡ Modern Python Tooling: Built with uv and Ruff
🔒 Type Safety: Full type hints with Pyright and Pydantic validation
🚀 CI/CD Ready: GitHub Actions workflows for testing, linting, and Docker builds

📦 Prerequisites

🐍 Python 3.10+ - Programming language
📦 uv - Fast Python package installer and resolver
🐳 Docker - Container platform (for builds)
☸️ Kubeflow Pipelines - ML workflow orchestration platform

💡 Quick Install uv:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

🚀 Getting Started

1️⃣ Install Dependencies

uv sync

2️⃣ Run Tests

uv run nox -s test

3️⃣ Compile a Pipeline

uv run nox -s compile_pipeline -- \
  --env dev \
  --pipeline_name sample-pipeline \
  --tag test \
  --model_type sample

📁 Project Structure

.
├── const/                          # Shared enumerations
│   ├── environment.py              # Environment enum (dev, prod)
│   ├── model_type.py               # Model type enum (sample, ...)
│   └── task.py                     # Task enum (feature_engineering, training, ...)
├── environments/                   # Environment-specific settings
│   ├── dev.py                      # Development environment config
│   ├── prod.py                     # Production environment config
│   └── settings.py                 # Settings loader
├── pipelines/                      # KFP pipeline definitions
│   ├── components.py               # KFP container components
│   ├── graphs/                     # Pipeline graph definitions
│   │   └── sample.py               # Sample pipeline graph
│   ├── main.py                     # Pipeline compiler & uploader
│   └── settings.py                 # Pipeline compilation settings
├── tasks/                          # ML task implementations
│   ├── base.py                     # BaseTask protocol
│   ├── feature_engineering/        # Feature engineering task
│   ├── training/                   # Model training task
│   ├── evaluation/                 # Model evaluation task
│   ├── inference/                  # Inference task
│   └── export/                     # Export task
├── tests/                          # Test suite (mirrors src structure)
├── main.py                         # Task executor (runs inside KFP containers)
├── noxfile.py                      # Task automation with Nox
├── pyproject.toml                  # Project dependencies & metadata
├── pytest.ini                      # Pytest configuration
└── ruff.toml                       # Ruff linter configuration

Key Files:

main.py - Entry point for task execution in containers
noxfile.py - Development task automation (test, lint, fmt, compile_pipeline)
pyproject.toml - Project configuration and dependencies
CLAUDE.md - Architecture guide for Claude Code

🛠️ Development Commands

🧪 Testing

# Run all tests
uv run nox -s test

# Run specific test file
uv run pytest tests/path/to/test__file.py

# Run with JUnit XML output
uv run nox -s test -- --junitxml=results.xml

✅ Code Quality

# Format code
uv run nox -s fmt

# Run all linters
uv run nox -s lint -- --pyright --ruff

# Run individual linters
uv run nox -s lint -- --pyright
uv run nox -s lint -- --ruff

🔧 Pipeline Development

# Compile and upload pipeline
uv run nox -s compile_pipeline -- \
  --env <dev|prod> \
  --pipeline_name <name> \
  --tag <tag> \
  --model_type <sample|...>

🏗️ Architecture Overview

This project uses a dual-mode architecture:

Pipeline Compilation Mode (pipelines/main.py): Compiles KFP pipeline definitions to YAML and uploads to Kubeflow
Task Execution Mode (main.py): Runs individual tasks inside KFP containers

🔄 How It Works

📝 Define tasks in tasks/<task_name>/ with settings and run logic
🔗 Create pipeline graphs in pipelines/graphs/ that chain tasks together
📋 Register components: tasks in main.py task_maps and pipelines in pipelines/main.py pipeline_types
📦 Compile pipeline with compile_pipeline - generates KFP YAML and uploads to registry
▶️ Execute: KFP runs pipeline - each component executes main.py with task-specific arguments in containers

➕ Adding New Pipelines

Step-by-Step Guide

1️⃣ Define Model Type

Add your model type to const/model_type.py:

class ModelType(StrEnum):
    """Enumeration for different Model Types."""

    SAMPLE = "sample"
    YOUR_MODEL = "your_model"  # ← Add this

2️⃣ Create Pipeline Graph

Create a new file pipelines/graphs/your_model.py:

from typing import TYPE_CHECKING

from kfp import dsl
from pipelines.components import (
    evaluation,
    feature_engineering,
    training,
    inference,
    export,
)

if TYPE_CHECKING:
    from kfp.dsl.graph_component import GraphComponent
    from pipelines.settings import PipelineCompileArgs


def get_pipeline(args: PipelineCompileArgs) -> GraphComponent:
    """Get your model pipeline.

    Args:
        args (PipelineCompileArgs): Pipeline arguments for compilation.

    Returns:
        GraphComponent: Pipeline Graph Component.
    """

    @dsl.pipeline(name=args.pipeline_name)
    def pipeline_def(execution_date: str) -> None:
        fe_task = feature_engineering(
            image=args.image,
            execution_date=execution_date,
            model_type=args.model_type,
        ).set_display_name("Feature Engineering")

        training_task = (
            training(
                image=args.image,
                execution_date=execution_date,
                model_type=args.model_type,
            )
            .after(fe_task)
            .set_display_name("Train Model")
        )
        # Add more tasks...

    return pipeline_def

3️⃣ Implement Tasks

Create task implementations in tasks/<task_name>/run.py:

from logging import getLogger

from tasks.base import T_co
from tasks.training.settings import TrainingSettings

logger = getLogger(__name__)


class TrainingTask:
    """Training Task."""

    def __init__(
        self,
        *args: tuple[T_co],
        **kwargs: dict[str, T_co],
    ) -> None:
        """Initialize the Training Task."""
        self.settings = TrainingSettings()

    def run(self) -> None:
        """Run the Training Task."""
        logger.info("settings=%s", self.settings)
        # Your training logic here

4️⃣ Register Components

Register tasks in main.py:

task_maps: dict[ModelType, dict[Task, type[BaseTask]]] = {
    ModelType.SAMPLE: {
        Task.FEATURE_ENGINEERING: FeatureEngineeringTask,
        Task.TRAINING: TrainingTask,
        # ...
    },
    ModelType.YOUR_MODEL: {  # ← Add this
        Task.TRAINING: YourTrainingTask,
        # ...
    },
}

Register pipeline in pipelines/main.py:

from pipelines.graphs import sample, your_model

pipeline_types = {
    ModelType.SAMPLE: sample.get_pipeline,
    ModelType.YOUR_MODEL: your_model.get_pipeline,  # ← Add this
}

5️⃣ Compile & Deploy

uv run nox -s compile_pipeline -- \
  --env dev \
  --pipeline_name your-model-pipeline \
  --tag v1.0.0 \
  --model_type your_model

💡 Tip: See CLAUDE.md for detailed architecture patterns and development guidelines.

📚 Related Resources

Official Documentation

📘 Kubeflow Pipelines v2 - KFP documentation
📦 uv Documentation - Python package manager
🔍 Ruff Documentation - Linter and formatter
✅ Pyright - Static type checker
🧪 Pytest - Testing framework
🔧 Nox - Task automation tool

Kubeflow Pipelines

KFP SDK Reference - Python SDK documentation
Container Components Guide - Building container-based components
Pipeline Compilation - Compiling pipelines to YAML

Python Libraries

Pydantic - Data validation using Python type annotations
Pydantic Settings - Settings management from environment variables

🤝 Contributing

We welcome contributions! Please follow these steps:

Development Workflow

🍴 Fork the repository

📥 Clone your fork:

git clone https://github.com/YOUR_USERNAME/ml-pipelines.git
cd ml-pipelines

🌿 Create a feature branch:
```
git checkout -b feature/amazing-feature
```
📦 Install dependencies:
```
uv sync
```
✏️ Make your changes with tests
🎨 Format code:
```
uv run nox -s fmt
```
🔍 Lint code:
```
uv run nox -s lint -- --pyright --ruff
```
✅ Test changes:
```
uv run nox -s test
```
💾 Commit your changes:
```
git commit -m 'Add amazing feature'
```
📤 Push to your branch:
```
git push origin feature/amazing-feature
```
📮 Submit a pull request

Code Standards

✅ Maintain 75%+ test coverage (enforced by pytest)
🎨 Follow Ruff formatting and linting rules (ruff.toml)
🔍 Pass Pyright type checking (pyrightconfig.json)
📝 Write clear commit messages
🧪 Add tests for new features
📚 Update documentation as needed

Testing Naming Convention

Test files must follow the test__*.py format (note the double underscore):

✅ test__base.py
✅ test__training.py
❌ test_base.py (single underscore - won't be discovered)

📄 License

This project is licensed under the terms specified in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
const		const
environments		environments
pipelines		pipelines
tasks		tasks
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
noxfile.py		noxfile.py
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini
renovate.json		renovate.json
ruff.toml		ruff.toml
uv.lock		uv.lock

License

a5chin/ml-pipelines

Folders and files

Latest commit

History

Repository files navigation