Throughputs of Long Sequences #12608 #1985
              
                Unanswered
              
          
                  
                    
                      simmonssong
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 1 comment
-
| interesting catch — yeah some infra optimizations in llama.cpp (like tiled kv-cache) do boost throughput on long sequences, we ran into this when stress-testing long-form prompts + nested reasoning — throughput went up, but logic fidelity went sideways. anyway, great question. if you're testing semantic drift under load too, happy to swap notes. some of those failure patterns are... spooky. | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am testing throughputs of input sequences with different lengths. I found that throughput increases with length on several different models and quantization, is this caused by build-in infrastructure optimization of Llama.cpp?
Beta Was this translation helpful? Give feedback.
All reactions