This model's maximum context length is 4096 tokens. However, you requested 6379 tokens (6029 in the messages, 350 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}
Currently this error message makes it hard for the user/admin to fix and retry their question. We want to handle this error in the backend by reducing the context being sent to the LLM. So even is the user has configured the Serve RAG LLM container image with top-k=2, when we receive this error, we want to be able to reduce k's value and resend the query to the LLM.