forked from ollama/ollama
-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What is the issue?
Running the pre-compiled Windows version found here: #7 (comment)
My AMD 7735HS with 680m iGPU is running my models at 30-50% faster rate than on CPU alone. Allocating 16GB for Graphics from the BIOS, the entire model layers are loaded in the iGPU shared part of the memory, but I noticed that normal system RAM is also getting filled with exactly the same amount.
Important to note this is not breaking the functionality, but in a 32GB system, it's making the remaining 16 GB of RAM unusable anytime I load a 14B model with 4k context window. The same does not occur with LM studio when using the Vulkan Backend despite it using Llama.cpp inference engine too.
OS
Windows
GPU
AMD
CPU
AMD
Ollama version
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working