Skip to content

Would like some help to understand this #2

@RayBytes

Description

@RayBytes

So I've installed the project and ran it, and on my M1 Pro 16gb I'm currently getting 32.5 tk/s with a 88% acceptance for the prompt I ran (Qwen 3.5 4B). However, running the MLX Qwen 3.5 4B in LM Studio was about this speed.

So my question is, since if I'm understanding it correctly, the models I'm running are not the MLX optimized versions of the models, is it possible to run an MLX optimized version to see the speed gains mentioned in the benchmarks?

Thank you, any help is appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions