Support custom layer splits from GPUShardingSpec.LayerSplit

## Context

`internal/controller/inferenceservice_controller.go` has a `calculateTensorSplit()` function that ignores the `sharding.LayerSplit` field from the CRD:

```go
// TODO: Support custom layer splits from sharding.LayerSplit
func calculateTensorSplit(gpuCount int32, _ *GPUShardingSpec) string {
```

Currently all GPUs get equal split ratios (`"1,1"` for 2 GPUs, `"1,1,1"` for 3, etc.) regardless of what the user specifies in `spec.hardware.gpu.sharding.layerSplit`.

## Problem

Users with asymmetric GPU configurations (e.g., mixed VRAM sizes) cannot control how layers are distributed. The `layerSplit` field exists in the CRD but is silently ignored.

## Proposed Solution

- Parse `sharding.LayerSplit` (e.g., `["0-15", "16-31"]`) into llama.cpp `--tensor-split` ratios
- Fall back to equal split when `LayerSplit` is empty (current behavior)
- Validate layer ranges don't overlap and cover the full model

## Location

`internal/controller/inferenceservice_controller.go:660`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support custom layer splits from GPUShardingSpec.LayerSplit #231

Context

Problem

Proposed Solution

Location

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support custom layer splits from GPUShardingSpec.LayerSplit #231

Description

Context

Problem

Proposed Solution

Location

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions