Skip to content

[Bug]: Input sequence length exceeds the max input length of embedding model. #731

@lichangqing2611

Description

@lichangqing2611

Bug Description

我们采用OV 0.2.6,embedding模型采用bce-embedding-base_v1-f16.gguf,这个embeddding模型max输入长度是512,physical batch size是1024,当openclaw对话结束后,还会持续几次embedding调度,但这几次调用的输入tokens会最长到几K,如8129,会报下面的错

Steps to Reproduce

Bug Info:

2026-03-18 11:02:50,993 - openviking.storage.queuefs.semantic_processor - INFO - Completed semantic generation for: viking://user/man/memories/preferences
2026-03-18 11:02:52,458 - openviking.storage.collection_schemas - ERROR - Error processing embedding message: OpenAI API error: Error code: 500 - {'error': {'code': 500, 'message': 'input (8129 tokens) is too large to process. increase the physical batch size (current batch size: 1024)', 'type': 'server_error'}}
Traceback (most recent call last):
  File "/usr/local/lib64/python3.11/site-packages/openviking/models/embedder/openai_embedders.py", line 99, in embed
    response = self.client.embeddings.create(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1294, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1067, in request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'error': {'code': 500, 'message': 'input (8129 tokens) is too large to process. increase the physical batch size (current batch size: 1024)', 'type': 'server_error'}}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib64/python3.11/site-packages/openviking/storage/collection_schemas.py", line 197, in on_dequeue
    result: EmbedResult = await asyncio.to_thread(
                          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/openviking/models/embedder/openai_embedders.py", line 104, in embed
    raise RuntimeError(f"OpenAI API error: {e.message}") from e
RuntimeError: OpenAI API error: Error code: 500 - {'error': {'code': 500, 'message': 'input (8129 tokens) is too large to process. increase the physical batch size (current batch size: 1024)', 'type': 'server_error'}}

Expected Behavior

No Error.

Actual Behavior

ERROR

Minimal Reproducible Example

Error Logs

OpenViking Version

0.2.6

Python Version

3.11.0

Operating System

Linux

Model Backend

Other

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions