Model is loaded twice with iGPU Shared Memory

### What is the issue?

Running the pre-compiled Windows version found here: https://github.com/whyvl/ollama-vulkan/issues/7#issuecomment-2661016400

My AMD 7735HS with 680m iGPU is running my models at 30-50% faster rate than on CPU alone. Allocating 16GB for Graphics from the BIOS, the entire model layers are loaded in the iGPU shared part of the memory, but I noticed that normal system RAM is also getting filled with exactly the same amount.

Important to note this is not breaking the functionality, but in a 32GB system, it's making the remaining 16 GB of RAM unusable anytime I load a 14B model with 4k context window. The same does not occur with LM studio when using the Vulkan Backend despite it using Llama.cpp inference engine too.

### OS

Windows

### GPU

AMD

### CPU

AMD

### Ollama version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model is loaded twice with iGPU Shared Memory #18

What is the issue?

OS

GPU

CPU

Ollama version

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model is loaded twice with iGPU Shared Memory #18

Description

What is the issue?

OS

GPU

CPU

Ollama version

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions