Skip to content

[programming][machine_learning] How to use llama.cpp #1213

@LukeShortCloud

Description

@LukeShortCloud

How to use llama.cpp instead of Ollama. Use AMD Strix Halo as a reference platform.

Download the latest binary release: https://github.com/ggml-org/llama.cpp/releases

⚠️ IMPORTANT: Always use flash attention (-fa 1) and no-mmap (--no-mmap) on Strix Halo to avoid crashes/slowdowns.

https://github.com/kyuz0/amd-strix-halo-toolboxes

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions