Skip to content

Model is loaded twice with iGPU Shared Memory #18

@jhemmond

Description

@jhemmond

What is the issue?

Running the pre-compiled Windows version found here: #7 (comment)

My AMD 7735HS with 680m iGPU is running my models at 30-50% faster rate than on CPU alone. Allocating 16GB for Graphics from the BIOS, the entire model layers are loaded in the iGPU shared part of the memory, but I noticed that normal system RAM is also getting filled with exactly the same amount.

Important to note this is not breaking the functionality, but in a 32GB system, it's making the remaining 16 GB of RAM unusable anytime I load a 14B model with 4k context window. The same does not occur with LM studio when using the Vulkan Backend despite it using Llama.cpp inference engine too.

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions