Description
Title: Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2)
Description
We’re working with onnxruntime-genai-android-0.8.2.aar
using Java/Kotlin for an Android app that loads a ~140MB model. While trying to reduce memory consumption during inference, we encountered the issue of high RAM usage.
Upon research, we found that disabling enable_cpu_mem_arena
helped reduce memory usage drastically in regular ONNX Runtime scenarios.However, we faced challenges applying the same in the ONNX GenAI wrapper for Android(Java API).
What We're Doing Now
We have a Kotlin wrapper that loads the model and performs inference using SimpleGenAI
, like this:
class GenAIHelper(modelFolderPath: String) {
var genAI: SimpleGenAI
init {
genAI = SimpleGenAI(modelFolderPath)
}
fun generateSummary(documentText: String): String {
val prompt = "...prompt with document content..."
// Prepare generation parameters
val params: GeneratorParams = genAI.createGeneratorParams()
params.setSearchOption("temperature", 0.7)
params.setSearchOption("do_sample", true)
params.setSearchOption("top_k", 10.0)
params.setSearchOption("top_p", 0.95)
params.setSearchOption("max_length", 1900.0)
// Generate text
return genAI.generate(params, prompt, null)
}
}
Our Challenge
We tried to apply memory optimizations like:
class GenAIHelper(modelFolderPath: String) {
var genAI: SimpleGenAI
init {
val options = SessionOptions()
options.setCPUArenaAllocator(false)
genAI = SimpleGenAI(modelFolderPath)
}
// ... rest of the code
}
However, setting the CPU arena allocator using SessionOptions.setCPUArenaAllocator(false)
did not appear to have any measurable effect.
A key limitation we are encountering is that we have not found a way to pass such configuration settings during model loading using ONNX GenAI.In standard ONNX Runtime (e.g., Java API), we are able to configure model behavior at load time using SessionOptions
, like this:
OrtSession.SessionOptions sessionOptions = new OrtSession.SessionOptions();
sessionOptions.setCPUArenaAllocator(false);
OrtSession session = env.createSession("model.onnx", sessionOptions);
Unlike the standard ONNX Runtime — where it is possible to pass SessionOptions
or execution provider settings while loading a model — the SimpleGenAI
class in the ONNX GenAI Android API currently accepts only a single parameter: the modelFolderPath
.
As a result, we are unable to apply important runtime configurations such as memory allocator flags or provider-specific optimizations.
Is there a recommended way to pass these options when using SimpleGenAI
, or is this functionality planned for future releases?