How to use llama.cpp instead of Ollama. Use AMD Strix Halo as a reference platform. Download the latest binary release: https://github.com/ggml-org/llama.cpp/releases ``` ⚠️ IMPORTANT: Always use flash attention (-fa 1) and no-mmap (--no-mmap) on Strix Halo to avoid crashes/slowdowns. ``` https://github.com/kyuz0/amd-strix-halo-toolboxes
How to use llama.cpp instead of Ollama. Use AMD Strix Halo as a reference platform.
Download the latest binary release: https://github.com/ggml-org/llama.cpp/releases
https://github.com/kyuz0/amd-strix-halo-toolboxes