So I've installed the project and ran it, and on my M1 Pro 16gb I'm currently getting 32.5 tk/s with a 88% acceptance for the prompt I ran (Qwen 3.5 4B). However, running the MLX Qwen 3.5 4B in LM Studio was about this speed.
So my question is, since if I'm understanding it correctly, the models I'm running are not the MLX optimized versions of the models, is it possible to run an MLX optimized version to see the speed gains mentioned in the benchmarks?
Thank you, any help is appreciated.
So I've installed the project and ran it, and on my M1 Pro 16gb I'm currently getting 32.5 tk/s with a 88% acceptance for the prompt I ran (Qwen 3.5 4B). However, running the MLX Qwen 3.5 4B in LM Studio was about this speed.
So my question is, since if I'm understanding it correctly, the models I'm running are not the MLX optimized versions of the models, is it possible to run an MLX optimized version to see the speed gains mentioned in the benchmarks?
Thank you, any help is appreciated.