Skip to content

Handle Context length exceeded error with smaller top-k value #62

@selvik

Description

@selvik

This model's maximum context length is 4096 tokens. However, you requested 6379 tokens (6029 in the messages, 350 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}

Currently this error message makes it hard for the user/admin to fix and retry their question. We want to handle this error in the backend by reducing the context being sent to the LLM. So even is the user has configured the Serve RAG LLM container image with top-k=2, when we receive this error, we want to be able to reduce k's value and resend the query to the LLM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions