MPS vs MSL vs CoreML

Apple has a neural engine, ANE that can run inference much faster than the GPUs, but the API (CoreML) sits at a much higher abstraction level than MSL. It takes whole models and does the gpu/cpu/ane assignment itself, the quantization, etc. 
Could be cool to allow some access to it in python with like @forge("ane") or something similar.

Note Pytorch has a final inference step tool for CoreML, but not for training obviously.

We should also probably not rely too much on MPS (can use it Eager exec), but for compiled, we should write our own MSL kernels so that they can be fused well and make less round trips to global memory. This is what MLX kind of does.

Note that MLX has been shifting to the new NA (neural accelerators) on the GPUs beginning with M5 chips (with Metal 4). (The ANE is no longer a separate silicon unit). This allows them to utilize gpu memory for faster matmuls. Apparently results in 4x faster inference on LLMs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPS vs MSL vs CoreML #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

MPS vs MSL vs CoreML #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions