gguf files with size larger than ~1.6Gb can not yet be loaded because of an IC limit on the number of accessed pages in the stable memory in a single update call.
During load_model of a gguf file that is larger than ~1.6GB, the IC throws this error:
% dfx canister call llama_cpp load_model '(record {
args = vec {
"--model"; "models/model.gguf";
"--cache-type-k"; "q8_0";
}
})'
Error: Failed update call.
Caused by: The replica returned a rejection error: reject code CanisterError, reject message Error from Canister bkyz2-fmaaa-aaaaa-qaaaq-cai: Canister exceeded memory access limits: Exceeded the limit for the number of accessed pages in the stable memory in a single message execution: limit 2097152 KB for regular messages and 1048576 KB for queries..
Try optimizing the use of stable memory so that individual messages don't need to access as much stable memory. See documentation: http://internetcomputer.org/docs/current/references/execution-errors#memory-access-limit-exceeded, error code None
The solution is likely to read the gguf file in multiple update calls.
llama.cpp already has a mechanism to read large models from several split gguf files, and we have to update that logic to work across multiple calls to the canister.
gguf files with size larger than ~1.6Gb can not yet be loaded because of an IC limit on the number of accessed pages in the stable memory in a single update call.
During load_model of a gguf file that is larger than ~1.6GB, the IC throws this error:
The solution is likely to read the gguf file in multiple update calls.
llama.cpp already has a mechanism to read large models from several split gguf files, and we have to update that logic to work across multiple calls to the canister.