Skip to content

Conversation

@MohammedTaherMcW
Copy link

Ticket

Link to JIRA Issue

Problem description

Migration of Gemma-3-27b-it from experimental setup to TT-Transformers Library

What's changed

Moved the experimental Gemma-3-27b-it to TT-Transformers
Added sliding_window support in Attention.
Added pre_feedforward_norm and post_feedforward_norm support in decoder

Checklist

@MohammedTaherMcW MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_1_experimental branch from 265f912 to 6ba3246 Compare August 27, 2025 19:15
@MohammedTaherMcW MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from 471e07f to f1a0685 Compare September 1, 2025 13:30
@jennychristopher jennychristopher changed the title Migrate Gemma-3-27b-it to TT-Transformers Library Migrate Gemma variants [1B, 4B, 27B] to TT-Transformers Library Sep 12, 2025
@MohammedTaherMcW MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from 359c925 to a6255ce Compare September 19, 2025 14:26
@nikileshx nikileshx force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from a6255ce to 863496d Compare September 23, 2025 08:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates Gemma-3 multimodal models (1B, 4B, 27B) from the experimental codebase to the unified TT-Transformers library, consolidating functionality and enabling better code reuse.

  • Migrates all Gemma-3 multimodal components to tt_transformers structure
  • Adds sliding window attention support and additional normalization layers
  • Consolidates experimental code into the main transformer library

Reviewed Changes

Copilot reviewed 49 out of 52 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
models/tt_transformers/tt/multimodal/gemma3/*.py Migrated vision components from experimental to tt_transformers
models/tt_transformers/tt/model_config.py Added Gemma-3 specific configurations and vision parameters
models/tt_transformers/tt/model.py Added attention mask support for sliding window attention
models/tt_transformers/tt/decoder.py Added pre/post feedforward normalization support
models/tt_transformers/tt/attention.py Enhanced with sliding window attention and mask parameters
models/tt_transformers/tt/generator.py Extended to support multimodal inputs and Gemma-3 models
models/experimental/gemma3/tt/*.py Removed experimental files (migrated to tt_transformers)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +1701 to +1804
text_prefix = ""
else:
text_prefix = self.state_dict_text_prefix
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The hardcoded empty string for Gemma-3 models could be replaced with a more explicit constant or method to improve maintainability and make the special case clearer.

Copilot uses AI. Check for mistakes.
layer = model.model.embed_tokens
else:
layer = reference_model.model.embed_tokens
layer = reference_model.model.model.embed_tokens
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double .model access looks suspicious and could lead to AttributeError if the structure is not as expected. This should be verified against the actual model structure.

Suggested change
layer = reference_model.model.model.embed_tokens
layer = reference_model.model.embed_tokens

Copilot uses AI. Check for mistakes.
core_grid=None, # FIXME: validate on TG ttnn.CoreGrid(y=8, x=8) if not pc_2 else None,
)
ttnn.deallocate(w2_in)
w2_out = ttnn.multiply(w2_out, self.num_devices)
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The multiplication followed by division pattern (line 254 and 275) suggests a scaling operation that could be abstracted into a helper function to reduce code duplication and improve clarity.

Copilot uses AI. Check for mistakes.
TG=args.is_galaxy,
)
else:
# If pre_feedforward_layernorm is not in state_dict, we do not use it
Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment should explain why pre_feedforward_layernorm might not be present and the implications of setting it to None for model behavior.

Suggested change
# If pre_feedforward_layernorm is not in state_dict, we do not use it
# pre_feedforward_layernorm may be absent from state_dict for certain model variants or checkpoint formats.
# When it is missing, self.pre_ff_norm is set to None and the model skips this normalization step,
# which may affect output consistency or compatibility with architectures that expect this layernorm.

Copilot uses AI. Check for mistakes.
# Deallocate input tensor to free memory
ttnn.deallocate(x_in)
# Reshape output back to original shape

Copy link

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions "Reshape output back to original shape" but no reshaping operation follows. This comment should be removed or the intended operation should be implemented.

Suggested change
if seq_len >= MAX_MM_SEQ_LEN:
output = ttnn.reshape(output, [1, seq_len, -1])

Copilot uses AI. Check for mistakes.
@nikileshx nikileshx force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from 863496d to 41a117d Compare September 24, 2025 12:10
@MohammedTaherMcW MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_1_experimental branch from fafc552 to 7ee5ddc Compare October 13, 2025 10:46
@MohammedTaherMcW MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from 41a117d to d40f3f7 Compare October 13, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants