load larger models in multiple update calls

gguf files with size larger than ~1.6Gb can not yet be loaded because of an IC limit on the number of accessed pages in the stable memory in a single update call.

---

During load_model of a gguf file that is larger than ~1.6GB, the IC throws this error:
```bash
% dfx canister call llama_cpp load_model '(record {
    args = vec {
      "--model"; "models/model.gguf";
      "--cache-type-k"; "q8_0";
    }
  })'
Error: Failed update call.
Caused by: The replica returned a rejection error: reject code CanisterError, reject message Error from Canister bkyz2-fmaaa-aaaaa-qaaaq-cai: Canister exceeded memory access limits: Exceeded the limit for the number of accessed pages in the stable memory in a single message execution: limit 2097152 KB for regular messages and 1048576 KB for queries..
Try optimizing the use of stable memory so that individual messages don't need to access as much stable memory. See documentation: http://internetcomputer.org/docs/current/references/execution-errors#memory-access-limit-exceeded, error code None
```

---
The solution is likely to read the gguf file in multiple update calls. 

llama.cpp already has a mechanism to read large models from several split gguf files, and we have to update that logic to work across multiple calls to the canister.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load larger models in multiple update calls #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

load larger models in multiple update calls #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions