I don't understand. I have a very slow Qwen3-8B model on my nvidia 3090 video card. Am I doing something wrong? I tried the Transformers solution.
DFlash
Metric Value
Speed 44.4 tok/s
Time 54.48 s
Generated 2419 tokens
Input 23 tokens
Block size 16
LM Studio's speculative decoding reaches a speed of 83 tokens/sec: Qwen3 8B+Qwen3 1.7B.