Skip to content

Air-gapped deployment: Support local/private model sources #53

@Defilan

Description

@Defilan

Problem

LLMKube currently requires internet access to download models from HuggingFace URLs. This blocks adoption in:

  • Air-gapped environments - Government, healthcare, finance, defense
  • Private networks - Corporate environments with restricted egress
  • Edge deployments - Remote locations with limited connectivity
  • Compliance scenarios - Data sovereignty requirements

Proposed Solution

Support multiple model source types beyond HTTP URLs:

1. Local File Path

apiVersion: inference.llmkube.dev/v1alpha1
kind: Model
metadata:
  name: my-model
spec:
  source:
    type: local
    path: /mnt/models/llama-3.1-8b.gguf

2. PVC Reference

spec:
  source:
    type: pvc
    claimName: model-storage
    path: models/llama-3.1-8b.gguf

3. S3-Compatible Storage (MinIO, etc.)

spec:
  source:
    type: s3
    bucket: llm-models
    key: llama-3.1-8b.gguf
    endpoint: http://minio.internal:9000
    secretRef:
      name: s3-credentials

4. Private HTTP Server

spec:
  source:
    type: http
    url: http://model-server.internal/models/llama-3.1-8b.gguf
    # Optional auth
    secretRef:
      name: http-credentials

5. OCI Registry (Harbor, etc.)

spec:
  source:
    type: oci
    image: harbor.internal/llm-models/llama-3.1-8b:v1

CLI Changes

# Deploy from local path
llmkube deploy my-model --source /mnt/models/llama.gguf --gpu

# Deploy from S3
llmkube deploy my-model --source s3://bucket/model.gguf --gpu

# Catalog support for local sources
llmkube deploy llama-3.1-8b --gpu --source-override /mnt/models/llama.gguf

# Pre-populate cache from local file
llmkube cache import /mnt/models/llama.gguf --as llama-3.1-8b

Implementation Phases

Phase 1: Local Path Support

  • Support file:// and absolute paths in source field
  • Mount hostPath or PVC in inference pods
  • Update controller to skip download for local sources

Phase 2: PVC and S3 Support

  • Add source type field to Model CRD
  • Implement S3 download with credentials
  • Support PVC references

Phase 3: OCI and Private Registry

  • Pull models from OCI registries
  • Support private registry authentication
  • Model versioning via tags

Benefits

  • Air-gapped deployments - No internet required
  • Faster deployments - Local models load instantly
  • Security - Models stay within network boundary
  • Compliance - Meet data residency requirements
  • Cost - No HuggingFace bandwidth costs

Related

Success Criteria

  • Deploy model from local file path
  • Deploy model from PVC
  • Deploy model from S3/MinIO
  • CLI supports local source paths
  • Documentation for air-gapped setup
  • Example manifests for each source type

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions