Skip to content

MPS vs MSL vs CoreML #23

@kellen-sun

Description

@kellen-sun

Apple has a neural engine, ANE that can run inference much faster than the GPUs, but the API (CoreML) sits at a much higher abstraction level than MSL. It takes whole models and does the gpu/cpu/ane assignment itself, the quantization, etc.
Could be cool to allow some access to it in python with like @forge("ane") or something similar.

Note Pytorch has a final inference step tool for CoreML, but not for training obviously.

We should also probably not rely too much on MPS (can use it Eager exec), but for compiled, we should write our own MSL kernels so that they can be fused well and make less round trips to global memory. This is what MLX kind of does.

Note that MLX has been shifting to the new NA (neural accelerators) on the GPUs beginning with M5 chips (with Metal 4). (The ANE is no longer a separate silicon unit). This allows them to utilize gpu memory for faster matmuls. Apparently results in 4x faster inference on LLMs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions