Skip to content

Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2) #1584

Open
@shanky2003g

Description

@shanky2003g

Title: Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2)

Description

@sayanshaw24

We’re working with onnxruntime-genai-android-0.8.2.aar using Java/Kotlin for an Android app that loads a ~140MB model. While trying to reduce memory consumption during inference, we encountered the issue of high RAM usage.

Upon research, we found that disabling enable_cpu_mem_arena helped reduce memory usage drastically in regular ONNX Runtime scenarios.However, we faced challenges applying the same in the ONNX GenAI wrapper for Android(Java API).


What We're Doing Now

We have a Kotlin wrapper that loads the model and performs inference using SimpleGenAI, like this:

class GenAIHelper(modelFolderPath: String) {
    var genAI: SimpleGenAI

    init {
        genAI = SimpleGenAI(modelFolderPath)
    }

    fun generateSummary(documentText: String): String {
        val prompt = "...prompt with document content..."

        // Prepare generation parameters
        val params: GeneratorParams = genAI.createGeneratorParams()
        params.setSearchOption("temperature", 0.7)
        params.setSearchOption("do_sample", true)
        params.setSearchOption("top_k", 10.0)
        params.setSearchOption("top_p", 0.95)
        params.setSearchOption("max_length", 1900.0)

        // Generate text
        return genAI.generate(params, prompt, null)
    }
}

Our Challenge

We tried to apply memory optimizations like:

class GenAIHelper(modelFolderPath: String) {
    var genAI: SimpleGenAI

    init {
        val options = SessionOptions()
        options.setCPUArenaAllocator(false)
        genAI = SimpleGenAI(modelFolderPath)
    }

    // ... rest of the code
}

However, setting the CPU arena allocator using SessionOptions.setCPUArenaAllocator(false) did not appear to have any measurable effect.

A key limitation we are encountering is that we have not found a way to pass such configuration settings during model loading using ONNX GenAI.In standard ONNX Runtime (e.g., Java API), we are able to configure model behavior at load time using SessionOptions, like this:

OrtSession.SessionOptions sessionOptions = new OrtSession.SessionOptions();
sessionOptions.setCPUArenaAllocator(false);
OrtSession session = env.createSession("model.onnx", sessionOptions);

Unlike the standard ONNX Runtime — where it is possible to pass SessionOptions or execution provider settings while loading a model — the SimpleGenAI class in the ONNX GenAI Android API currently accepts only a single parameter: the modelFolderPath.

As a result, we are unable to apply important runtime configurations such as memory allocator flags or provider-specific optimizations.
Is there a recommended way to pass these options when using SimpleGenAI, or is this functionality planned for future releases?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions