- ๐ Overview
- ๐ฆ Prerequisites
- ๐ Getting Started
- ๐ Project Structure
- ๐ ๏ธ Development Commands
- ๐๏ธ Architecture Overview
- โ Adding New Pipelines
- ๐ Related Resources
- ๐ค Contributing
- ๐ License
This is a production-ready template for building Kubeflow Pipelines (KFP) workflows with Python. It provides a structured, scalable architecture for ML pipelines with containerized task execution, type-safe configuration, and comprehensive testing.
- ๐ Kubeflow Pipelines Integration: Build, compile, and deploy KFP workflows
- ๐งฉ Task-Based Architecture: Modular ML tasks (feature engineering, training, evaluation, inference, export)
- ๐ Environment Management: Multi-environment support (dev, prod) with isolated configurations
- โก Modern Python Tooling: Built with uv and Ruff
- ๐ Type Safety: Full type hints with Pyright and Pydantic validation
- ๐ CI/CD Ready: GitHub Actions workflows for testing, linting, and Docker builds
- ๐ Python 3.10+ - Programming language
- ๐ฆ uv - Fast Python package installer and resolver
- ๐ณ Docker - Container platform (for builds)
- โธ๏ธ Kubeflow Pipelines - ML workflow orchestration platform
๐ก Quick Install uv:
# macOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
uv syncuv run nox -s testuv run nox -s compile_pipeline -- \
--env dev \
--pipeline_name sample-pipeline \
--tag test \
--model_type sample.
โโโ const/ # Shared enumerations
โ โโโ environment.py # Environment enum (dev, prod)
โ โโโ model_type.py # Model type enum (sample, ...)
โ โโโ task.py # Task enum (feature_engineering, training, ...)
โโโ environments/ # Environment-specific settings
โ โโโ dev.py # Development environment config
โ โโโ prod.py # Production environment config
โ โโโ settings.py # Settings loader
โโโ pipelines/ # KFP pipeline definitions
โ โโโ components.py # KFP container components
โ โโโ graphs/ # Pipeline graph definitions
โ โ โโโ sample.py # Sample pipeline graph
โ โโโ main.py # Pipeline compiler & uploader
โ โโโ settings.py # Pipeline compilation settings
โโโ tasks/ # ML task implementations
โ โโโ base.py # BaseTask protocol
โ โโโ feature_engineering/ # Feature engineering task
โ โโโ training/ # Model training task
โ โโโ evaluation/ # Model evaluation task
โ โโโ inference/ # Inference task
โ โโโ export/ # Export task
โโโ tests/ # Test suite (mirrors src structure)
โโโ main.py # Task executor (runs inside KFP containers)
โโโ noxfile.py # Task automation with Nox
โโโ pyproject.toml # Project dependencies & metadata
โโโ pytest.ini # Pytest configuration
โโโ ruff.toml # Ruff linter configuration
Key Files:
main.py- Entry point for task execution in containersnoxfile.py- Development task automation (test, lint, fmt, compile_pipeline)pyproject.toml- Project configuration and dependenciesCLAUDE.md- Architecture guide for Claude Code
# Run all tests
uv run nox -s test
# Run specific test file
uv run pytest tests/path/to/test__file.py
# Run with JUnit XML output
uv run nox -s test -- --junitxml=results.xml# Format code
uv run nox -s fmt
# Run all linters
uv run nox -s lint -- --pyright --ruff
# Run individual linters
uv run nox -s lint -- --pyright
uv run nox -s lint -- --ruff# Compile and upload pipeline
uv run nox -s compile_pipeline -- \
--env <dev|prod> \
--pipeline_name <name> \
--tag <tag> \
--model_type <sample|...>This project uses a dual-mode architecture:
- Pipeline Compilation Mode (
pipelines/main.py): Compiles KFP pipeline definitions to YAML and uploads to Kubeflow - Task Execution Mode (
main.py): Runs individual tasks inside KFP containers
- ๐ Define tasks in
tasks/<task_name>/with settings and run logic - ๐ Create pipeline graphs in
pipelines/graphs/that chain tasks together - ๐ Register components: tasks in
main.pytask_maps and pipelines inpipelines/main.pypipeline_types - ๐ฆ Compile pipeline with
compile_pipeline- generates KFP YAML and uploads to registry โถ๏ธ Execute: KFP runs pipeline - each component executesmain.pywith task-specific arguments in containers
Add your model type to const/model_type.py:
class ModelType(StrEnum):
"""Enumeration for different Model Types."""
SAMPLE = "sample"
YOUR_MODEL = "your_model" # โ Add thisCreate a new file pipelines/graphs/your_model.py:
from typing import TYPE_CHECKING
from kfp import dsl
from pipelines.components import (
evaluation,
feature_engineering,
training,
inference,
export,
)
if TYPE_CHECKING:
from kfp.dsl.graph_component import GraphComponent
from pipelines.settings import PipelineCompileArgs
def get_pipeline(args: PipelineCompileArgs) -> GraphComponent:
"""Get your model pipeline.
Args:
args (PipelineCompileArgs): Pipeline arguments for compilation.
Returns:
GraphComponent: Pipeline Graph Component.
"""
@dsl.pipeline(name=args.pipeline_name)
def pipeline_def(execution_date: str) -> None:
fe_task = feature_engineering(
image=args.image,
execution_date=execution_date,
model_type=args.model_type,
).set_display_name("Feature Engineering")
training_task = (
training(
image=args.image,
execution_date=execution_date,
model_type=args.model_type,
)
.after(fe_task)
.set_display_name("Train Model")
)
# Add more tasks...
return pipeline_defCreate task implementations in tasks/<task_name>/run.py:
from logging import getLogger
from tasks.base import T_co
from tasks.training.settings import TrainingSettings
logger = getLogger(__name__)
class TrainingTask:
"""Training Task."""
def __init__(
self,
*args: tuple[T_co],
**kwargs: dict[str, T_co],
) -> None:
"""Initialize the Training Task."""
self.settings = TrainingSettings()
def run(self) -> None:
"""Run the Training Task."""
logger.info("settings=%s", self.settings)
# Your training logic hereRegister tasks in main.py:
task_maps: dict[ModelType, dict[Task, type[BaseTask]]] = {
ModelType.SAMPLE: {
Task.FEATURE_ENGINEERING: FeatureEngineeringTask,
Task.TRAINING: TrainingTask,
# ...
},
ModelType.YOUR_MODEL: { # โ Add this
Task.TRAINING: YourTrainingTask,
# ...
},
}Register pipeline in pipelines/main.py:
from pipelines.graphs import sample, your_model
pipeline_types = {
ModelType.SAMPLE: sample.get_pipeline,
ModelType.YOUR_MODEL: your_model.get_pipeline, # โ Add this
}uv run nox -s compile_pipeline -- \
--env dev \
--pipeline_name your-model-pipeline \
--tag v1.0.0 \
--model_type your_model๐ก Tip: See CLAUDE.md for detailed architecture patterns and development guidelines.
- ๐ Kubeflow Pipelines v2 - KFP documentation
- ๐ฆ uv Documentation - Python package manager
- ๐ Ruff Documentation - Linter and formatter
- โ Pyright - Static type checker
- ๐งช Pytest - Testing framework
- ๐ง Nox - Task automation tool
- KFP SDK Reference - Python SDK documentation
- Container Components Guide - Building container-based components
- Pipeline Compilation - Compiling pipelines to YAML
- Pydantic - Data validation using Python type annotations
- Pydantic Settings - Settings management from environment variables
We welcome contributions! Please follow these steps:
- ๐ด Fork the repository
- ๐ฅ Clone your fork:
git clone https://github.com/YOUR_USERNAME/ml-pipelines.git cd ml-pipelines - ๐ฟ Create a feature branch:
git checkout -b feature/amazing-feature
- ๐ฆ Install dependencies:
uv sync
- โ๏ธ Make your changes with tests
- ๐จ Format code:
uv run nox -s fmt
- ๐ Lint code:
uv run nox -s lint -- --pyright --ruff
- โ
Test changes:
uv run nox -s test - ๐พ Commit your changes:
git commit -m 'Add amazing feature' - ๐ค Push to your branch:
git push origin feature/amazing-feature
- ๐ฎ Submit a pull request
- โ Maintain 75%+ test coverage (enforced by pytest)
- ๐จ Follow Ruff formatting and linting rules (
ruff.toml) - ๐ Pass Pyright type checking (
pyrightconfig.json) - ๐ Write clear commit messages
- ๐งช Add tests for new features
- ๐ Update documentation as needed
Test files must follow the test__*.py format (note the double underscore):
- โ
test__base.py - โ
test__training.py - โ
test_base.py(single underscore - won't be discovered)
This project is licensed under the terms specified in the LICENSE file.