Skip to content

fix: chat_template loading and spatial_shapes unbound variable in Eagle VL#582

Open
cagataycali wants to merge 4 commits intoNVIDIA:mainfrom
cagataycali:fix/chat-template-loading
Open

fix: chat_template loading and spatial_shapes unbound variable in Eagle VL#582
cagataycali wants to merge 4 commits intoNVIDIA:mainfrom
cagataycali:fix/chat-template-loading

Conversation

@cagataycali
Copy link
Copy Markdown

Summary

Two bug fixes for GR00T N1.5/N1.6 inference with transformers>=4.51:

Bug 1: chat_template is NoneValueError: Cannot render chat template

File: gr00t/model/gr00t_n1d6/processing_gr00t_n1d6.py

Root cause: AutoProcessor.from_pretrained() loads tokenizer_config.json which has a simpler template without <image> token support. The proper multimodal template lives in chat_template.json but is not loaded by the processor.

Fix: After AutoProcessor.from_pretrained(), check if chat_template is None. If so, load it from chat_template.json (or fall back to tokenizer_config.json) in the Eagle model directory.

Bug 2: spatial_shapes unbound → UnboundLocalError

File: gr00t/model/modules/nvidia/Eagle-Block2A-2B-v2/modeling_eagle3_vl.py

Root cause: In extract_feature(), when select_layer != -1 (GR00T uses -4), the else branch only grabs hidden_states[select_layer] but never captures spatial_shapes from vision_model_output. Line 366 then uses spatial_shapes which was never assigned.

Fix: Initialize spatial_shapes = None before the branch, and extract it from vision_model_output in both if and else branches.

Reproduction

from gr00t.model.gr00t_n1d6 import Gr00tN1D6Policy

policy = Gr00tN1D6Policy(
    model_path="nvidia/GR00T-N1.5-3B",
    embodiment_tag="new_embodiment",
)
# Bug 1: ValueError on policy.get_action() due to None chat_template
# Bug 2: UnboundLocalError on spatial_shapes in Eagle VL forward pass

Testing

Verified with 12 integration tests on an NVIDIA L40S GPU (Python 3.10, transformers==4.51.3, gr00t==0.1.0). All pass after these fixes.

Changes

File Lines Description
processing_gr00t_n1d6.py +18/-1 Load chat_template.json fallback when AutoProcessor misses it
modeling_eagle3_vl.py +6/-2 Initialize spatial_shapes = None; extract in both branches

Additional note

chat_template.json exists in the source tree at gr00t/model/modules/nvidia/Eagle-Block2A-2B-v2/chat_template.json but is not included in the pip-installed wheel (gr00t==0.1.0). Users installing via pip need to manually copy this file. Consider adding it to the wheel packaging.

transformers >=4.51 requires chat_template to be set on the processor
for apply_chat_template() to work, but AutoProcessor.from_pretrained()
does not always load it from the bundled Eagle module's
chat_template.json or tokenizer_config.json.

This adds a fallback in build_processor() that reads the template
from disk if the processor's chat_template is None after loading.
When select_layer != -1, the else branch in extract_feature() never
captured spatial_shapes from vision_model_output before using it in
pixel_shuffle_back(). Initialize spatial_shapes = None and extract it
from the vision model output in both branches.

Also ensures chat_template.json is loaded as the multimodal-aware
template (not the simpler tokenizer_config.json one).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant