-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
area/gpuGPU-related features and issuesGPU-related features and issuescomponent/controllerRelated to the operator controllerRelated to the operator controllerenhancementNew feature or requestNew feature or request
Description
Context
internal/controller/inferenceservice_controller.go has a calculateTensorSplit() function that ignores the sharding.LayerSplit field from the CRD:
// TODO: Support custom layer splits from sharding.LayerSplit
func calculateTensorSplit(gpuCount int32, _ *GPUShardingSpec) string {Currently all GPUs get equal split ratios ("1,1" for 2 GPUs, "1,1,1" for 3, etc.) regardless of what the user specifies in spec.hardware.gpu.sharding.layerSplit.
Problem
Users with asymmetric GPU configurations (e.g., mixed VRAM sizes) cannot control how layers are distributed. The layerSplit field exists in the CRD but is silently ignored.
Proposed Solution
- Parse
sharding.LayerSplit(e.g.,["0-15", "16-31"]) into llama.cpp--tensor-splitratios - Fall back to equal split when
LayerSplitis empty (current behavior) - Validate layer ranges don't overlap and cover the full model
Location
internal/controller/inferenceservice_controller.go:660
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/gpuGPU-related features and issuesGPU-related features and issuescomponent/controllerRelated to the operator controllerRelated to the operator controllerenhancementNew feature or requestNew feature or request