RTX 5080 (Blackwell) + 16GB VRAM: Kimodo fails with CUDA OOM due to LLM fallback loading on GPU even when CPU mode is requested

---

**Environment:**

* OS: Ubuntu 24.04
* GPU: NVIDIA GeForce RTX 5080 (16 GB VRAM)
* Driver: 590.48.01
* CUDA (driver): 13.1
* PyTorch: 2.11.0+cu126
* Kimodo: latest (pip install kimodo[all])

---

**Summary:**

Kimodo consistently fails with CUDA out-of-memory errors on a 16GB GPU due to the LLM (Meta-Llama-3-8B-Instruct) being loaded onto GPU memory during fallback, even when CPU execution is explicitly requested.

---

**Steps to Reproduce:**

1. Activate environment:

```bash
conda activate kimodo
```

2. Attempt to force CPU usage:

```bash
export KIMODO_TEXT_ENCODER_DEVICE=cpu
export TRANSFORMERS_DEVICE=cpu
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
```

3. Run:

```bash
kimodo_demo
```

---

**Observed Behavior:**

* The text encoder service is not running, so Kimodo falls back to local LLM2Vec:

```text
Text encoder service is unreachable → falling back to local LLM2Vec encoder
```

* Despite CPU settings, the fallback loads Llama onto GPU:

```text
~14.7 GiB allocated by PyTorch
```

* This leaves insufficient VRAM for the motion model:

```text
CUDA out of memory. Tried to allocate 20.00 MiB
```

* Final result: model fails to load

---

**Expected Behavior:**

* When `KIMODO_TEXT_ENCODER_DEVICE=cpu` is set, the fallback LLM should remain entirely on CPU
  **OR**
* Kimodo should fail early with a clear message requiring `kimodo_textencoder` service instead of silently falling back

---

**Additional Notes:**

* Running the text encoder as a separate service resolves the issue:

```bash
kimodo_textencoder   # Terminal 1
kimodo_demo          # Terminal 2
```

* However, this requirement is not clearly enforced or documented, leading to confusing OOM failures.

* This issue is especially problematic on GPUs with 16GB VRAM, where:

  * Llama 8B consumes ~14–15GB
  * Motion model requires additional memory
  * Combined load exceeds capacity

---

**Suggested Improvements:**

1. Respect `KIMODO_TEXT_ENCODER_DEVICE=cpu` for fallback path
2. Add explicit warning or error if text encoder service is not running
3. Provide a lightweight / quantized LLM fallback option
4. Document GPU memory requirements clearly (≥24GB recommended)

---

**Impact:**

This prevents Kimodo from running on otherwise capable GPUs (e.g., RTX 5080 16GB), even though the motion model itself would fit if the LLM were isolated.

---

**Question:**

Is there a recommended way to:

* force LLM fallback to CPU reliably
* or use a smaller / quantized text encoder
* Any other workarounds
---

Thanks for the excellent work on Kimodo — this is a very promising framework for human motion generation.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTX 5080 (Blackwell) + 16GB VRAM: Kimodo fails with CUDA OOM due to LLM fallback loading on GPU even when CPU mode is requested #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RTX 5080 (Blackwell) + 16GB VRAM: Kimodo fails with CUDA OOM due to LLM fallback loading on GPU even when CPU mode is requested #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions