Skip to content

Propopal: Add InferenceClass to Inference Pool #1036

Open
@Xunzhuo

Description

@Xunzhuo

What would you like to be added:

InferenceClass is to support other EPP implementation, like the semantic router, or the inference-plugin we`ve done in AIBrix.

The inference pool is list-watch by different implementation, but the inference class determine which epp implementation to take the inference pool and inference model.

This introduces a way for letting inference api to be like gwapi, the epp provided by GIE is just one of its implementation.

For the Gateway Controller, they only care about the EndpointPickerConfig, create ext-proc filter and cluster, and modify route,cluster in envoy to support routing based on the header/metadata modified by ext-proc server.

For the EPP implementation, they implement the inference pool and model, different implementation may have different focus, the end-user can choose to use the epp implementation they need, and specify the inferenceClass.

As the way goes, the gateway controller introduces a pluggable epp solution, like I can support envoy ai gateway to integrate with GIE or AIBrix inference plugin or semantic router.

GIE can be:

InferenceClass: gie

AIBrix can be:

InferenceClass: aibrix

Semantic Router:

InferenceClass: semantic

When creating the inference pool, when need to specify the inference class, to define which epp to schedule the traffic.

The standard communication from envoyproxy and ext-proc, defined by EndpointPickerConfig, which can be implemented by GW control plane.

Why is this needed:

Support multiple epp implementation

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions