Skip to content

Support custom layer splits from GPUShardingSpec.LayerSplit #231

@Defilan

Description

@Defilan

Context

internal/controller/inferenceservice_controller.go has a calculateTensorSplit() function that ignores the sharding.LayerSplit field from the CRD:

// TODO: Support custom layer splits from sharding.LayerSplit
func calculateTensorSplit(gpuCount int32, _ *GPUShardingSpec) string {

Currently all GPUs get equal split ratios ("1,1" for 2 GPUs, "1,1,1" for 3, etc.) regardless of what the user specifies in spec.hardware.gpu.sharding.layerSplit.

Problem

Users with asymmetric GPU configurations (e.g., mixed VRAM sizes) cannot control how layers are distributed. The layerSplit field exists in the CRD but is silently ignored.

Proposed Solution

  • Parse sharding.LayerSplit (e.g., ["0-15", "16-31"]) into llama.cpp --tensor-split ratios
  • Fall back to equal split when LayerSplit is empty (current behavior)
  • Validate layer ranges don't overlap and cover the full model

Location

internal/controller/inferenceservice_controller.go:660

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/gpuGPU-related features and issuescomponent/controllerRelated to the operator controllerenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions