Would like some help to understand this

So I've installed the project and ran it, and on my M1 Pro 16gb I'm currently getting 32.5 tk/s with a 88% acceptance for the prompt I ran (Qwen 3.5 4B). However, running the MLX Qwen 3.5 4B in LM Studio was about this speed. 

So my question is, since if I'm understanding it correctly, the models I'm running are not the MLX optimized versions of the models, is it possible to run an MLX optimized version to see the speed gains mentioned in the benchmarks?

Thank you, any help is appreciated. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would like some help to understand this #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Would like some help to understand this #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions