You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of ordering by relevance. This will make requests like "summarization" possible, despite the claims that RAG is not suited for overarching queries.
Consider the following: if you set the chunk size X count high enough that they cover a signification portion of the original document (like a half), then the relevance by which the vector search selects those chunks becomes less important - the model will receive a large enough portion of the text anyway to create its summary or answer similar questions. The only thing that currently makes this unfeasible is that the order of these chunks become jumbled by that "relevance", making it impossible to ask questions that involve chronology of the document's events, for example.
You could say that at this point I might as well just feed the model the entire text as a file, but nope, even halving the text saves a significant amount of tokens, not to mention that localdocs are tokenized once, while uploading a document directly to chat requires doing that every time.
I'm sure this can't be too difficult to just skip the relevance ordering. If this is low priority, can you at least point me to a place in code where I could try to tweak it locally?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Instead of ordering by relevance. This will make requests like "summarization" possible, despite the claims that RAG is not suited for overarching queries.
Consider the following: if you set the chunk size X count high enough that they cover a signification portion of the original document (like a half), then the relevance by which the vector search selects those chunks becomes less important - the model will receive a large enough portion of the text anyway to create its summary or answer similar questions. The only thing that currently makes this unfeasible is that the order of these chunks become jumbled by that "relevance", making it impossible to ask questions that involve chronology of the document's events, for example.
You could say that at this point I might as well just feed the model the entire text as a file, but nope, even halving the text saves a significant amount of tokens, not to mention that localdocs are tokenized once, while uploading a document directly to chat requires doing that every time.
I'm sure this can't be too difficult to just skip the relevance ordering. If this is low priority, can you at least point me to a place in code where I could try to tweak it locally?
Beta Was this translation helpful? Give feedback.
All reactions