-
Notifications
You must be signed in to change notification settings - Fork 302
feat: Add OpenVINO binding for semantic routing #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds OpenVINO backend support for semantic routing, enabling high-performance BERT classification and embedding on Intel CPUs using the OpenVINO toolkit. The implementation provides Go bindings over a C++ library that integrates with OpenVINO for inference.
Key Changes:
- Complete OpenVINO binding implementation with C++ core and Go CGO bindings
- Support for text classification, token classification, and embedding generation using ModernBERT models
- Build system integration with Makefile targets and CMake configuration
- Model conversion utilities and test infrastructure
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| Makefile | Adds openvino.mk to the build system |
| tools/make/openvino.mk | Defines build, test, and benchmark targets for OpenVINO binding |
| tools/make/models.mk | Adds model conversion targets for OpenVINO IR format |
| openvino-binding/semantic-router.go | Go bindings exposing OpenVINO functionality via CGO |
| openvino-binding/semantic-router_test.go | Comprehensive test suite covering all binding features |
| openvino-binding/cpp/src/ffi/openvino_semantic_router_ffi.cpp | C FFI layer bridging Go and C++ implementation |
| openvino-binding/cpp/src/embeddings/embedding_generator.cpp | Embedding generation implementation with mean pooling |
| openvino-binding/cpp/src/classifiers/text_classifier.cpp | Text classification with concurrent inference support |
| openvino-binding/cpp/src/classifiers/token_classifier.cpp | Token classification with BIO tagging for NER/PII detection |
| openvino-binding/cpp/src/core/* | Core utilities for model management and tokenization |
| openvino-binding/scripts/*.py | Python scripts for tokenizer and model conversion |
| openvino-binding/cmd/benchmark/main.go | Benchmark comparing OpenVINO vs Candle implementations |
| openvino-binding/CMakeLists.txt | CMake build configuration with OpenVINO integration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
| // ModernBERT token classifiers often have lower per-token confidence | ||
| result.entities.clear(); | ||
| for (const auto& span : entity_spans) { | ||
| if (span.confidence > 0.3f) { |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confidence threshold 0.3f is hardcoded here. Consider making this configurable via a parameter or constant to allow tuning for different use cases and model behaviors.
| @@ -0,0 +1,3 @@ | |||
| module github.com/vllm-project/semantic-router/openvino-binding | |||
|
|
|||
| go 1.21 | |||
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The go.mod file specifies go 1.21 but the benchmark's go.mod uses go 1.24.1 and toolchain go1.24.7. These version specifications are inconsistent. Consider updating the main go.mod to match the minimum Go version requirement across the project.
| go 1.21 | |
| go 1.24.1 |
| result.entities[i].entity_type = utils::strDup(entity.entity_type.c_str()); | ||
| result.entities[i].start = entity.start; | ||
| result.entities[i].end = entity.end; | ||
| result.entities[i].text = utils::strDup(entity.entity_type.c_str()); // Simplified |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text field is incorrectly assigned entity_type instead of the actual entity text. This should be entity.text to properly capture the extracted text of the entity.
result.entities[i].text = utils::strDup(entity.text.c_str());| result.entities[i].text = utils::strDup(entity.entity_type.c_str()); // Simplified | |
| result.entities[i].text = utils::strDup(entity.text.c_str()); |
| Convert ModernBERT classification and PII models from HuggingFace to OpenVINO IR format | ||
| """ | ||
|
|
||
| import os |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'os' is not used.
| import os |
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
e65b738 to
8f30267
Compare
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|

FILL IN THE PR DESCRIPTION HERE
Add OpenVINO backend support for BERT classification and embedding on CPU
BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE
-swhen doinggit commit[Bugfix],[Feat], and[CI].Detailed Checklist (Click to Expand)
Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]for bug fixes.[CI/Build]for build or continuous integration improvements.[Doc]for documentation fixes and improvements.[Feat]for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).[Router]for changes to thevllm_router(e.g., routing algorithm, router observability, etc.).[Misc]for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
pre-committo format your code. SeeREADME.mdfor installation.DCO and Signed-off-by
When contributing changes to this project, you must agree to the DCO. Commits must include a
Signed-off-by:header which certifies agreement with the terms of the DCO.Using
-swithgit commitwill automatically add this header.What to Expect for the Reviews