Environment:
- OS: Ubuntu 24.04
- GPU: NVIDIA GeForce RTX 5080 (16 GB VRAM)
- Driver: 590.48.01
- CUDA (driver): 13.1
- PyTorch: 2.11.0+cu126
- Kimodo: latest (pip install kimodo[all])
Summary:
Kimodo consistently fails with CUDA out-of-memory errors on a 16GB GPU due to the LLM (Meta-Llama-3-8B-Instruct) being loaded onto GPU memory during fallback, even when CPU execution is explicitly requested.
Steps to Reproduce:
- Activate environment:
- Attempt to force CPU usage:
export KIMODO_TEXT_ENCODER_DEVICE=cpu
export TRANSFORMERS_DEVICE=cpu
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
- Run:
Observed Behavior:
- The text encoder service is not running, so Kimodo falls back to local LLM2Vec:
Text encoder service is unreachable → falling back to local LLM2Vec encoder
- Despite CPU settings, the fallback loads Llama onto GPU:
~14.7 GiB allocated by PyTorch
- This leaves insufficient VRAM for the motion model:
CUDA out of memory. Tried to allocate 20.00 MiB
- Final result: model fails to load
Expected Behavior:
- When
KIMODO_TEXT_ENCODER_DEVICE=cpu is set, the fallback LLM should remain entirely on CPU
OR
- Kimodo should fail early with a clear message requiring
kimodo_textencoder service instead of silently falling back
Additional Notes:
- Running the text encoder as a separate service resolves the issue:
kimodo_textencoder # Terminal 1
kimodo_demo # Terminal 2
-
However, this requirement is not clearly enforced or documented, leading to confusing OOM failures.
-
This issue is especially problematic on GPUs with 16GB VRAM, where:
- Llama 8B consumes ~14–15GB
- Motion model requires additional memory
- Combined load exceeds capacity
Suggested Improvements:
- Respect
KIMODO_TEXT_ENCODER_DEVICE=cpu for fallback path
- Add explicit warning or error if text encoder service is not running
- Provide a lightweight / quantized LLM fallback option
- Document GPU memory requirements clearly (≥24GB recommended)
Impact:
This prevents Kimodo from running on otherwise capable GPUs (e.g., RTX 5080 16GB), even though the motion model itself would fit if the LLM were isolated.
Question:
Is there a recommended way to:
- force LLM fallback to CPU reliably
- or use a smaller / quantized text encoder
- Any other workarounds
Thanks for the excellent work on Kimodo — this is a very promising framework for human motion generation.
Environment:
Summary:
Kimodo consistently fails with CUDA out-of-memory errors on a 16GB GPU due to the LLM (Meta-Llama-3-8B-Instruct) being loaded onto GPU memory during fallback, even when CPU execution is explicitly requested.
Steps to Reproduce:
Observed Behavior:
Expected Behavior:
KIMODO_TEXT_ENCODER_DEVICE=cpuis set, the fallback LLM should remain entirely on CPUOR
kimodo_textencoderservice instead of silently falling backAdditional Notes:
However, this requirement is not clearly enforced or documented, leading to confusing OOM failures.
This issue is especially problematic on GPUs with 16GB VRAM, where:
Suggested Improvements:
KIMODO_TEXT_ENCODER_DEVICE=cpufor fallback pathImpact:
This prevents Kimodo from running on otherwise capable GPUs (e.g., RTX 5080 16GB), even though the motion model itself would fit if the LLM were isolated.
Question:
Is there a recommended way to:
Thanks for the excellent work on Kimodo — this is a very promising framework for human motion generation.