Handle Context length exceeded error with smaller top-k value

```This model's maximum context length is 4096 tokens. However, you requested 6379 tokens (6029 in the messages, 350 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}```

Currently this error message makes it hard for the user/admin to fix and retry their question. We want to handle this error in the backend by reducing the context being sent to the LLM. So even is the user has configured the Serve RAG LLM container image with top-k=2, when we receive this error, we want to be able to reduce k's value and resend the query to the LLM.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Context length exceeded error with smaller top-k value #62

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handle Context length exceeded error with smaller top-k value #62

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions