LlamaIndex Triton Integration Streaming Support #18671
Replies: 2 comments 3 replies
-
Hello @shsaronian! I'm Dosu, your friendly bot here to assist you with bugs, answer questions, and help you become a contributor. I'm here to help while you wait for a human maintainer. How can I assist you today? Yes, the integration between LlamaIndex and Triton Inference Server supports streaming responses similar to those used in the chat engine. The To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Beta Was this translation helpful? Give feedback.
-
Hello @shsaronian! I'm Dosu, your friendly bot here to assist you with bugs, answer questions, and help you become a contributor. I'm here to help while you wait for a human maintainer. How can I assist you today? Yes, the integration between LlamaIndex and Triton Inference Server supports streaming responses similar to those used in the chat engine. The To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
does the integration between llamaindex and triton inference server allow streaming responses as used in the chat engine? It would be not ideal to wait for the whole response to return and then show it to the user.
Beta Was this translation helpful? Give feedback.
All reactions