Skip to content

Very slow on my nvidia 3090: 44 tok/s #20

@mp3pintyo

Description

@mp3pintyo

I don't understand. I have a very slow Qwen3-8B model on my nvidia 3090 video card. Am I doing something wrong? I tried the Transformers solution.
DFlash
Metric Value
Speed ​​44.4 tok/s
Time 54.48 s
Generated 2419 tokens
Input 23 tokens
Block size 16

LM Studio's speculative decoding reaches a speed of 83 tokens/sec: Qwen3 8B+Qwen3 1.7B.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions