Migrate Gemma variants [1B, 4B, 27B] to TT-Transformers Library #28

MohammedTaherMcW · 2025-08-24T18:06:21Z

Ticket

Problem description

Migration of Gemma-3-27b-it from experimental setup to TT-Transformers Library

What's changed

Moved the experimental Gemma-3-27b-it to TT-Transformers
Added sliding_window support in Attention.
Added pre_feedforward_norm and post_feedforward_norm support in decoder

Checklist

Copilot

Pull Request Overview

This PR migrates Gemma-3 multimodal models (1B, 4B, 27B) from the experimental codebase to the unified TT-Transformers library, consolidating functionality and enabling better code reuse.

Migrates all Gemma-3 multimodal components to tt_transformers structure
Adds sliding window attention support and additional normalization layers
Consolidates experimental code into the main transformer library

Reviewed Changes

Copilot reviewed 49 out of 52 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
models/tt_transformers/tt/multimodal/gemma3/*.py	Migrated vision components from experimental to tt_transformers
models/tt_transformers/tt/model_config.py	Added Gemma-3 specific configurations and vision parameters
models/tt_transformers/tt/model.py	Added attention mask support for sliding window attention
models/tt_transformers/tt/decoder.py	Added pre/post feedforward normalization support
models/tt_transformers/tt/attention.py	Enhanced with sliding window attention and mask parameters
models/tt_transformers/tt/generator.py	Extended to support multimodal inputs and Gemma-3 models
models/experimental/gemma3/tt/*.py	Removed experimental files (migrated to tt_transformers)

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-24T07:31:32Z

models/tt_transformers/tt/model_config.py

+            text_prefix = ""
+        else:
+            text_prefix = self.state_dict_text_prefix


[nitpick] The hardcoded empty string for Gemma-3 models could be replaced with a more explicit constant or method to improve maintainability and make the special case clearer.

Copilot · 2025-09-24T07:31:32Z

models/tt_transformers/tt/model_config.py

                layer = model.model.embed_tokens
            else:
-                layer = reference_model.model.embed_tokens
+                layer = reference_model.model.model.embed_tokens


Double .model access looks suspicious and could lead to AttributeError if the structure is not as expected. This should be verified against the actual model structure.

Suggested change

layer = reference_model.model.model.embed_tokens

layer = reference_model.model.embed_tokens

Copilot · 2025-09-24T07:31:32Z

models/tt_transformers/tt/mlp.py

            core_grid=None,  # FIXME: validate on TG ttnn.CoreGrid(y=8, x=8) if not pc_2 else None,
        )
        ttnn.deallocate(w2_in)
+        w2_out = ttnn.multiply(w2_out, self.num_devices)


[nitpick] The multiplication followed by division pattern (line 254 and 275) suggests a scaling operation that could be abstracted into a helper function to reduce code duplication and improve clarity.

Copilot · 2025-09-24T07:31:33Z

models/tt_transformers/tt/decoder.py

+                TG=args.is_galaxy,
+            )
+        else:
+            # If pre_feedforward_layernorm is not in state_dict, we do not use it


[nitpick] The comment should explain why pre_feedforward_layernorm might not be present and the implications of setting it to None for model behavior.

Suggested change

# If pre_feedforward_layernorm is not in state_dict, we do not use it

# pre_feedforward_layernorm may be absent from state_dict for certain model variants or checkpoint formats.

# When it is missing, self.pre_ff_norm is set to None and the model skips this normalization step,

# which may affect output consistency or compatibility with architectures that expect this layernorm.

Copilot · 2025-09-24T07:31:33Z

models/tt_transformers/tt/multimodal/gemma3/gemma_image_mlp.py

+        # Deallocate input tensor to free memory
+        ttnn.deallocate(x_in)
+        # Reshape output back to original shape
+


The comment mentions "Reshape output back to original shape" but no reshaping operation follows. This comment should be removed or the intended operation should be implemented.

Suggested change

if seq_len >= MAX_MM_SEQ_LEN:

output = ttnn.reshape(output, [1, seq_len, -1])

MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_1_experimental branch from 265f912 to 6ba3246 Compare August 27, 2025 19:15

MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from 471e07f to f1a0685 Compare September 1, 2025 13:30

jennychristopher requested a review from willwray September 12, 2025 07:48

jennychristopher changed the title ~~Migrate Gemma-3-27b-it to TT-Transformers Library~~ Migrate Gemma variants [1B, 4B, 27B] to TT-Transformers Library Sep 12, 2025

MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from 359c925 to a6255ce Compare September 19, 2025 14:26

nikileshx force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from a6255ce to 863496d Compare September 23, 2025 08:12

jschuhmacher requested review from Copilot and jschuhmacher September 24, 2025 07:29

Copilot AI reviewed Sep 24, 2025

View reviewed changes

nikileshx force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from 863496d to 41a117d Compare September 24, 2025 12:10

MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_1_experimental branch from fafc552 to 7ee5ddc Compare October 13, 2025 10:46

MohammedTaherMcW and others added 8 commits October 13, 2025 17:59

Migrate Gemma3 to TT-Transformers

cdde6b5

Fix tests for Gemma after rebase

50ddb36

Generalize state dict prefix for Gemma vision model

2876aa0

Gemma model files clean up

168f1d9

Fix repeatation issue for Gemma migration

60e419a

Fix Trace issue in Migration PR

a3698c2

Fix Repetition issue in Migration

e888971

Add support for sliding window mask in TT-Transformers

d40f3f7

MohammedTaherMcW force-pushed the mcw/gemma_3_27b/pr_2_tt_transformers branch from 41a117d to d40f3f7 Compare October 13, 2025 18:52

Handle Gemma vision pipeline

c0e4055

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate Gemma variants [1B, 4B, 27B] to TT-Transformers Library #28

Migrate Gemma variants [1B, 4B, 27B] to TT-Transformers Library #28

Uh oh!

MohammedTaherMcW commented Aug 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Copilot AI Sep 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	layer = reference_model.model.model.embed_tokens
	layer = reference_model.model.embed_tokens

-            # If pre_feedforward_layernorm is not in state_dict, we do not use it
+            # pre_feedforward_layernorm may be absent from state_dict for certain model variants or checkpoint formats.
+            # When it is missing, self.pre_ff_norm is set to None and the model skips this normalization step,
+            # which may affect output consistency or compatibility with architectures that expect this layernorm.


	if seq_len >= MAX_MM_SEQ_LEN:
	output = ttnn.reshape(output, [1, seq_len, -1])

Migrate Gemma variants [1B, 4B, 27B] to TT-Transformers Library #28

Are you sure you want to change the base?

Migrate Gemma variants [1B, 4B, 27B] to TT-Transformers Library #28

Uh oh!

Conversation

MohammedTaherMcW commented Aug 24, 2025

Ticket

Problem description

What's changed

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants