Description
What would you like to be added:
InferenceClass is to support other EPP implementation, like the semantic router, or the inference-plugin we`ve done in AIBrix.
The inference pool is list-watch by different implementation, but the inference class determine which epp implementation to take the inference pool and inference model.
This introduces a way for letting inference api to be like gwapi, the epp provided by GIE is just one of its implementation.
For the Gateway Controller, they only care about the EndpointPickerConfig, create ext-proc filter and cluster, and modify route,cluster in envoy to support routing based on the header/metadata modified by ext-proc server.
For the EPP implementation, they implement the inference pool and model, different implementation may have different focus, the end-user can choose to use the epp implementation they need, and specify the inferenceClass.
As the way goes, the gateway controller introduces a pluggable epp solution, like I can support envoy ai gateway to integrate with GIE or AIBrix inference plugin or semantic router.
GIE can be:
InferenceClass: gie
AIBrix can be:
InferenceClass: aibrix
Semantic Router:
InferenceClass: semantic
When creating the inference pool, when need to specify the inference class, to define which epp to schedule the traffic.
The standard communication from envoyproxy and ext-proc, defined by EndpointPickerConfig, which can be implemented by GW control plane.
Why is this needed:
Support multiple epp implementation