Skip to content
This repository was archived by the owner on Feb 14, 2026. It is now read-only.

lopatnov/ovms-continue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenVINO Model Server for Continue VSCode Extension

Local AI models for code completion and chat in VS Code via Continue Extension. This project can run various models, but the Qwen2.5-Coder model has shown good results experimentally. Server is precompiled for Windows.

The hardware and software on which the system was run

Current models

Model Size Speed Goal Pushed to GitHub
Qwen2.5-Coder-14B ~8GB Best quality No, too huge
Qwen2.5-Coder-7B ~4GB ⚡⚡ Balance No, too huge
Qwen2.5-Coder-3B ~2GB ⚡⚡⚡ Quick Yes
Qwen2.5-Coder-1.5B ~1GB ⚡⚡⚡⚡ Autocomplete Yes
Qwen2.5-Coder-0.5B ~300MB ⚡⚡⚡⚡⚡ Minimal resources Yes

Quick start

1. Configure models

Edit config_all.json in models folder to configure models that should be loaded.

For example:

{
    "model_config_list": [
        {
            "config": {
                "name": "Qwen2.5-Coder-0.5B-Instruct-int4-ov",
                "base_path": "Qwen2.5-Coder-0.5B-Instruct-int4-ov"
            }
        },
        {
            "config": {
                "name": "Qwen2.5-Coder-1.5B-Instruct-int4-ov",
                "base_path": "Qwen2.5-Coder-1.5B-Instruct-int4-ov"
            }
        },
        {
            "config": {
                "name": "Qwen2.5-Coder-3B-Instruct-int4-ov",
                "base_path": "Qwen2.5-Coder-3B-Instruct-int4-ov"
            }
        },
        {
            "config": {
                "name": "Qwen2.5-Coder-7B-Instruct-int4-ov",
                "base_path": "Qwen2.5-Coder-7B-Instruct-int4-ov"
            }
        }
        {
            "config": {
                "name": "Qwen2.5-Coder-14B-Instruct-int4-ov",
                "base_path": "Qwen2.5-Coder-14B-Instruct-int4-ov"
            }
        }
    ]
}

I not recommend to use many models. More models requires more memory.

2. Start OVMS

cd C:\ai-projects\ovms-continue
start-ovms.bat

3. Check

Open in browser: http://localhost:8000/v3/models

4. Configure VS Code

Continue will automatically connect to OVMS (port 8000).

Edit configuration file C:\Users\<Username>\.continue\config.yaml like so:

models:
  - name: Qwen2.5-Coder-3B (GPU)
    provider: openai
    model: Qwen2.5-Coder-3B-Instruct-int4-ov
    apiKey: unused
    apiBase: http://localhost:8000/v3
    roles:
      - chat
      - edit
      - apply

  - name: Qwen2.5-Coder-1.5B (GPU)
    provider: openai
    model: Qwen2.5-Coder-1.5B-Instruct-int4-ov
    apiKey: unused
    apiBase: http://localhost:8000/v3
    roles:
      - autocomplete

Set your local models in Continue extension:

Continue Models

Check that it's working

Continue works



Adding New Models from HuggingFace

Download Model

cd .\models
git clone https://huggingface.co/OpenVINO/Qwen2.5-Coder-7B-Instruct-int4-ov

Prepare Model Structure

OVMS requires specific folder structure. After downloading:

1. Create version subfolder 1/:

cd Qwen2.5-Coder-7B-Instruct-int4-ov
mkdir 1

2. Move model files into 1/:

move *.json 1\
move *.bin 1\
move *.xml 1\
move *.txt 1\
move *.model 1\

3. Create graph.pbtxt in model root folder (not in 1/):

input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"

node: {
  name: "LLMExecutor"
  calculator: "HttpLLMCalculator"
  input_stream: "LOOPBACK:loopback"
  input_stream: "HTTP_REQUEST_PAYLOAD:input"
  input_side_packet: "LLM_NODE_RESOURCES:llm"
  output_stream: "LOOPBACK:loopback"
  output_stream: "HTTP_RESPONSE_PAYLOAD:output"
  input_stream_info: {
    tag_index: 'LOOPBACK:0',
    back_edge: true
  }
  node_options: {
      [type.googleapis.com / mediapipe.LLMCalculatorOptions]: {
          models_path: "./1",
          cache_size: 4,
          max_num_seqs: 256,
          dynamic_split_fuse: true,
          device: "GPU"
      }
  }
  input_stream_handler {
    input_stream_handler: "SyncSetInputStreamHandler",
    options {
      [mediapipe.SyncSetInputStreamHandlerOptions.ext] {
        sync_set {
          tag_index: "LOOPBACK:0"
        }
      }
    }
  }
}

Change device: "GPU" to "CPU" or "NPU" if needed.

4. Create chat_template.jinja for chat models (example for Qwen):

{%- for message in messages -%}
    {%- if message['role'] == 'system' -%}
        {{- '<|im_start|>system\n' + message['content'] + '<|im_end|>\n' -}}
    {%- elif message['role'] == 'user' -%}
        {{- '<|im_start|>user\n' + message['content'] + '<|im_end|>\n' -}}
    {%- elif message['role'] == 'assistant' -%}
        {{- '<|im_start|>assistant\n' + message['content'] + '<|im_end|>\n' -}}
    {%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
    {{- '<|im_start|>assistant\n' -}}
{%- endif -%}

Final Model Structure

models/
└── Qwen2.5-Coder-7B-Instruct-int4-ov/
    ├── graph.pbtxt              # OVMS graph config
    ├── chat_template.jinja      # Chat format template
    └── 1/                       # Version folder (required!)
        ├── config.json
        ├── generation_config.json
        ├── openvino_model.xml
        ├── openvino_model.bin
        ├── openvino_tokenizer.xml
        ├── openvino_tokenizer.bin
        ├── openvino_detokenizer.xml
        ├── openvino_detokenizer.bin
        ├── tokenizer.json
        ├── tokenizer_config.json
        └── ...

Add to config_all.json

{
    "model_config_list": [
        {
            "config": {
                "name": "Qwen2.5-Coder-7B-Instruct-int4-ov",
                "base_path": "Qwen2.5-Coder-7B-Instruct-int4-ov"
            }
        }
    ]
}

Important Notes

  • File encoding: graph.pbtxt must be UTF-8 without BOM. PowerShell's Out-File adds BOM — use [System.IO.File]::WriteAllText($path, $content, [System.Text.UTF8Encoding]::new($false)) instead.
  • Context length: Models with small context (2K tokens like GPT-J) don't work well with Continue. Use models with 4K+ context.
  • Vision models: May require newer OpenVINO version.

Links

Authors

This project was created for fun by Oleksandr Lopatnov.

About

OpenVINO Model Server for Continue VSCode Extension

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors