Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2)

## Title: Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2)

### Description
@sayanshaw24 

We’re working with `onnxruntime-genai-android-0.8.2.aar` using Java/Kotlin for an Android app that loads a ~140MB model. While trying to reduce memory consumption during inference, we encountered the issue of high RAM usage. 

Upon research, we found that disabling `enable_cpu_mem_arena` helped reduce memory usage drastically in regular ONNX Runtime scenarios.However, we faced challenges applying the same in the **ONNX GenAI wrapper** for Android(Java API).

---
### What We're Doing Now

We have a Kotlin wrapper that loads the model and performs inference using `SimpleGenAI`, like this:

```kotlin
class GenAIHelper(modelFolderPath: String) {
    var genAI: SimpleGenAI

    init {
        genAI = SimpleGenAI(modelFolderPath)
    }

    fun generateSummary(documentText: String): String {
        val prompt = "...prompt with document content..."

        // Prepare generation parameters
        val params: GeneratorParams = genAI.createGeneratorParams()
        params.setSearchOption("temperature", 0.7)
        params.setSearchOption("do_sample", true)
        params.setSearchOption("top_k", 10.0)
        params.setSearchOption("top_p", 0.95)
        params.setSearchOption("max_length", 1900.0)

        // Generate text
        return genAI.generate(params, prompt, null)
    }
}

```
### Our Challenge

We tried to apply memory optimizations like:

```kotlin
class GenAIHelper(modelFolderPath: String) {
    var genAI: SimpleGenAI

    init {
        val options = SessionOptions()
        options.setCPUArenaAllocator(false)
        genAI = SimpleGenAI(modelFolderPath)
    }

    // ... rest of the code
}
```
However, setting the CPU arena allocator using `SessionOptions.setCPUArenaAllocator(false)` did not appear to have any measurable effect.

A key limitation we are encountering is that we have not found a way to pass such configuration settings during model loading using ONNX GenAI.In standard ONNX Runtime (e.g., Java API), we are able to configure model behavior at load time using `SessionOptions`, like this:
```java
OrtSession.SessionOptions sessionOptions = new OrtSession.SessionOptions();
sessionOptions.setCPUArenaAllocator(false);
OrtSession session = env.createSession("model.onnx", sessionOptions);
```

Unlike the standard ONNX Runtime — where it is possible to pass `SessionOptions` or execution provider settings while loading a model — the `SimpleGenAI` class in the ONNX GenAI Android API currently accepts only a single parameter: the `modelFolderPath`.

As a result, we are unable to apply important runtime configurations such as memory allocator flags or provider-specific optimizations.  
**Is there a recommended way to pass these options when using `SimpleGenAI`, or is this functionality planned for future releases?**




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2) #1584

Title: Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2)

Description

What We're Doing Now

Our Challenge

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2) #1584

Description

Title: Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2)

Description

What We're Doing Now

Our Challenge

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions