[programming][machine_learning] How to use llama.cpp

How to use llama.cpp instead of Ollama. Use AMD Strix Halo as a reference platform.

Download the latest binary release: https://github.com/ggml-org/llama.cpp/releases

```
⚠️ IMPORTANT: Always use flash attention (-fa 1) and no-mmap (--no-mmap) on Strix Halo to avoid crashes/slowdowns.
```

https://github.com/kyuz0/amd-strix-halo-toolboxes