KServe Inference Controller

This repository contains the Kubernetes custom controller for managing long-running inferences on KServe. It decouples request handling from execution by leveraging job-based orchestration.

Summary

Overview
Architecture
The Runner Contract
Examples
Configuration

Overview

The KServe Inference Controller is designed to handle complex, long-running inference tasks that would typically time out or block a standard controller loop. Instead of making direct HTTP calls to KServe, the operator orchestrates Kubernetes Jobs to act as "Runners." These runners handle the entire lifecycle of an inference: data retrieval from a source, the inference call itself, and the persistence of results to a storage backend.

This approach ensures:

Resilience: If a node fails, the Job is automatically rescheduled.
Scalability: Each inference run has its own dedicated resources.
Extensibility: Support for new data sources or storage backends only requires updating the runner image, not the core controller.

Architecture

The system revolves around two primary Custom Resources (CRs) and a specialized Runner image:

InferenceConfig: A static definition of a model's deployment. It specifies the KServe endpoint, the runner image to use, and the storage backend configuration (Input/Output).
InferenceRun: A single execution trigger. It references a config and provides specific execution parameters (e.g., table names, query filters).
Runner Job: A containerized process spawned by the controller that executes the business logic defined in the "Contract."

Workflow

User creates an InferenceConfig for a specific model (e.g., sklearn-iris).
User triggers an InferenceRun referencing that config.
The Controller generates a Contract (JSON) and spawns a Kubernetes Job.
The Runner reads the contract, fetches data, calls KServe, and saves the output.
The Controller monitors the Job and updates the InferenceRun status.

The Runner Contract

The communication between the controller and the execution Job is governed by a Contract. This contract is passed to the runner as a JSON file, allowing for a standardized way to handle diverse storage backends. The runner is meant to be as lightweight as possible and only get/store data. No computation should be done by the runner. Data transformations should be handled by a notebook in the finops-database-handler.

Example contract passed by the controller to the Runner, through a ConfigMap:

{
   "jobId":"5be07ada-5fe0-4c9e-a8bc-aaae67d8d344",
   "jobName":"inf-example-inference-run-triton-5be07ada",
   "kserve":{
      "modelName":"sklearn-iris",
      "modelUrl":"kserve-krateo-ttm-predictor.kserve-test.svc.cluster.local/v2/models/granite-timeseries-ttm-r2/infer",
      "modelVersion":"v2",
      "modelInputName":"past_values"
   },
   "input":{
      "krateo":{
         "api":{
            "endpointRef":{
               "name":"finops-database-handler-endpoint",
               "namespace":"kserve-controller-system"
            },
            "headers":[
               "Accept: application/json",
               "Content-Type: application/json"
            ],
            "path":"/compute/tritoninput",
            "verb":"POST"
         }
      }
   },
   "output":{
      "krateo":{
         "api":{
            "endpointRef":{
               "name":"finops-database-handler-endpoint",
               "namespace":"kserve-controller-system"
            },
            "headers":[
               "Accept: application/json",
               "Content-Type: application/json"
            ],
            "path":"/compute/tritonoutput",
            "verb":"POST"
         }
      }
   },
   "parameters":{
      "input_data_length":"512",
      "input_table_column_name":"average",
      "input_table_name":"azuretoolkit",
      "key_name":"resourceid",
      "key_value":"sample_vm",
      "output_table_name":"kserve_controller_output_triton"
   }
}

To see how this specific contract is used, check runners/krateo/main.go and the counter part notebooks in charts/chart/templates/notebook-triton.yaml. Note that the runner is inject with the environment variable pod_uid, which might be useful to store data for scheduled inference runs.

Extensibility via RawExtension

The storage.input and storage.output keys in the CRD have no schema. This allows the InferenceConfig to support any storage provider (e.g., Krateo FinOps, S3, GCS, etc.) without changing the controller. The runner receives the contract with the data unmodified. Therefore, by providing a specialized runner image, you can implement custom logic to parse these raw configurations and interact with any proprietary or cloud-native data store.

Examples

InferenceConfig

Defines the "How" and "Where" of the model and storage.

apiVersion: ai.krateo.io/v1
kind: InferenceConfig
metadata:
  name: iris-inference-config
  namespace: kserve-controller-system
spec:
  autoDeletePolicy: None # None | DeleteOnCompletion | DeleteOnSuccess
  image: ghcr.io/krateoplatformops/kserve-krateo-runner:0.1.0
  modelName: sklearn-iris
  modelUrl: sklearn-iris-predictor.kserve-test.svc.cluster.local/v2/models/sklearn-iris/infer
  modelVersion: v2
  storage:
    input:
      krateo:
        api:
          endpointRef: 
            name: finops-database-handler-endpoint
            namespace: kserve-controller-system
          path: /compute/kserveinput
          verb: POST
          headers:
          - 'Accept: application/json'
          - 'Content-Type: application/json'
    output:
      krateo:
        api:
          endpointRef: 
            name: finops-database-handler-endpoint
            namespace: kserve-controller-system
          path: /compute/kserveoutput
          verb: POST
          headers:
          - 'Accept: application/json'
          - 'Content-Type: application/json'

InferenceRun

Defines the "When" and "What" of a specific execution.

apiVersion: ai.krateo.io/v1
kind: InferenceRun
metadata:
  name: iris-run-january
  namespace: kserve-controller-system
spec:
  configRef:
    name: example-inference-config
    namespace: kserve-controller-system
  timeoutSeconds: 3600
  schedule: * * * * *
  parameters:
    input_table_name: kserve_controller_input
    output_table_name: kserve_controller_output

If the schedule field is populated, the controller creates a CronJob instead of a Job. The schedule is passed as is to the CronJob. The AutoDeletePolicy will not apply to the CronJob. parameters are passed as is in the JSON contract.

Configuration

The controller can be configured via environment variables to tune its reconciliation behavior:

POLLING_INTERVAL: Duration (e.g., 1m) between status checks.
MAX_RECONCILE_RATE: Number of concurrent workers.

Installation

The operator can be installed through its Helm chart:

helm repo add krateo https://charts.krateo.io
helm repo update krateo
helm template kserve-controller krateo/kserve-controller | kubectl apply -f -

Repository Structure

The repository contains several objects needed by the kserve-controller:

the controller code in internal and api
the helm chart for the controller with crds in /chart
the model for TTM adapted for the Triton KServe engine in models
the Krateo runners for sklearn-iris and triton-ttm for the storage finops-database-handler in runners/krateo-iris and runners/krateo-ttm
example CRs in testdata
e2e test code in testing test

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
api		api
chart		chart
crds		crds
hack		hack
internal		internal
models		models
runners		runners
test/e2e		test/e2e
testdata		testdata
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
go.mod		go.mod
go.sum		go.sum
main.go		main.go
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KServe Inference Controller

Summary

Overview

Architecture

Workflow

The Runner Contract

Extensibility via RawExtension

Examples

InferenceConfig

InferenceRun

Configuration

Installation

Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

krateoplatformops/kserve-controller

Folders and files

Latest commit

History

Repository files navigation

KServe Inference Controller

Summary

Overview

Architecture

Workflow

The Runner Contract

Extensibility via RawExtension

Examples

InferenceConfig

InferenceRun

Configuration

Installation

Repository Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages