Skip to content

Conversation

@rootfs
Copy link
Collaborator

@rootfs rootfs commented Nov 25, 2025

FILL IN THE PR DESCRIPTION HERE

Add OpenVINO backend support for BERT classification and embedding on CPU

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

@rootfs rootfs requested a review from Xunzhuo as a code owner November 25, 2025 18:02
@netlify
Copy link

netlify bot commented Nov 25, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit b617fcc
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/692884e2cf26ef00084a8a2e
😎 Deploy Preview https://deploy-preview-733--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rootfs rootfs requested review from Copilot and removed request for Xunzhuo November 25, 2025 18:03
Copilot finished reviewing on behalf of rootfs November 25, 2025 18:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds OpenVINO backend support for semantic routing, enabling high-performance BERT classification and embedding on Intel CPUs using the OpenVINO toolkit. The implementation provides Go bindings over a C++ library that integrates with OpenVINO for inference.

Key Changes:

  • Complete OpenVINO binding implementation with C++ core and Go CGO bindings
  • Support for text classification, token classification, and embedding generation using ModernBERT models
  • Build system integration with Makefile targets and CMake configuration
  • Model conversion utilities and test infrastructure

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
Makefile Adds openvino.mk to the build system
tools/make/openvino.mk Defines build, test, and benchmark targets for OpenVINO binding
tools/make/models.mk Adds model conversion targets for OpenVINO IR format
openvino-binding/semantic-router.go Go bindings exposing OpenVINO functionality via CGO
openvino-binding/semantic-router_test.go Comprehensive test suite covering all binding features
openvino-binding/cpp/src/ffi/openvino_semantic_router_ffi.cpp C FFI layer bridging Go and C++ implementation
openvino-binding/cpp/src/embeddings/embedding_generator.cpp Embedding generation implementation with mean pooling
openvino-binding/cpp/src/classifiers/text_classifier.cpp Text classification with concurrent inference support
openvino-binding/cpp/src/classifiers/token_classifier.cpp Token classification with BIO tagging for NER/PII detection
openvino-binding/cpp/src/core/* Core utilities for model management and tokenization
openvino-binding/scripts/*.py Python scripts for tokenizer and model conversion
openvino-binding/cmd/benchmark/main.go Benchmark comparing OpenVINO vs Candle implementations
openvino-binding/CMakeLists.txt CMake build configuration with OpenVINO integration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

// ModernBERT token classifiers often have lower per-token confidence
result.entities.clear();
for (const auto& span : entity_spans) {
if (span.confidence > 0.3f) {
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confidence threshold 0.3f is hardcoded here. Consider making this configurable via a parameter or constant to allow tuning for different use cases and model behaviors.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,3 @@
module github.com/vllm-project/semantic-router/openvino-binding

go 1.21
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The go.mod file specifies go 1.21 but the benchmark's go.mod uses go 1.24.1 and toolchain go1.24.7. These version specifications are inconsistent. Consider updating the main go.mod to match the minimum Go version requirement across the project.

Suggested change
go 1.21
go 1.24.1

Copilot uses AI. Check for mistakes.
result.entities[i].entity_type = utils::strDup(entity.entity_type.c_str());
result.entities[i].start = entity.start;
result.entities[i].end = entity.end;
result.entities[i].text = utils::strDup(entity.entity_type.c_str()); // Simplified
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text field is incorrectly assigned entity_type instead of the actual entity text. This should be entity.text to properly capture the extracted text of the entity.

result.entities[i].text = utils::strDup(entity.text.c_str());
Suggested change
result.entities[i].text = utils::strDup(entity.entity_type.c_str()); // Simplified
result.entities[i].text = utils::strDup(entity.text.c_str());

Copilot uses AI. Check for mistakes.
Convert ModernBERT classification and PII models from HuggingFace to OpenVINO IR format
"""

import os
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
@github-actions
Copy link

github-actions bot commented Nov 25, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • openvino-binding/.gitignore
  • openvino-binding/CMakeLists.txt
  • openvino-binding/README.md
  • openvino-binding/cmake/openvino_semantic_routerConfig.cmake.in
  • openvino-binding/cmd/benchmark/go.mod
  • openvino-binding/cmd/benchmark/main.go
  • openvino-binding/convert_modernbert_models.py
  • openvino-binding/cpp/include/classifiers/lora_adapter.h
  • openvino-binding/cpp/include/classifiers/lora_classifier.h
  • openvino-binding/cpp/include/classifiers/text_classifier.h
  • openvino-binding/cpp/include/classifiers/token_classifier.h
  • openvino-binding/cpp/include/core/model_manager.h
  • openvino-binding/cpp/include/core/tokenizer.h
  • openvino-binding/cpp/include/core/types.h
  • openvino-binding/cpp/include/embeddings/embedding_generator.h
  • openvino-binding/cpp/include/openvino_semantic_router.h
  • openvino-binding/cpp/include/utils/math_utils.h
  • openvino-binding/cpp/include/utils/preprocessing.h
  • openvino-binding/cpp/src/classifiers/lora_adapter.cpp
  • openvino-binding/cpp/src/classifiers/lora_classifier.cpp
  • openvino-binding/cpp/src/classifiers/text_classifier.cpp
  • openvino-binding/cpp/src/classifiers/token_classifier.cpp
  • openvino-binding/cpp/src/core/model_manager.cpp
  • openvino-binding/cpp/src/core/tokenizer.cpp
  • openvino-binding/cpp/src/embeddings/embedding_generator.cpp
  • openvino-binding/cpp/src/ffi/openvino_semantic_router_ffi.cpp
  • openvino-binding/cpp/src/utils/math_utils.cpp
  • openvino-binding/cpp/src/utils/preprocessing.cpp
  • openvino-binding/examples/embedding_example.go
  • openvino-binding/examples/lora_example.go
  • openvino-binding/examples/similarity_example.go
  • openvino-binding/go.mod
  • openvino-binding/scripts/convert_all_lora_models.sh
  • openvino-binding/scripts/convert_lora_models.py
  • openvino-binding/scripts/convert_test_tokenizers.py
  • openvino-binding/scripts/convert_tokenizers.py
  • openvino-binding/semantic-router.go
  • openvino-binding/semantic-router_lora_test.go
  • openvino-binding/semantic-router_test.go
  • Makefile

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/openvino.mk
  • tools/make/models.mk

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants