Transformers Deep Dive

Hello world 👋
I often spend my weekends exploring what transformer-based architectures are doing under the hood and in this repo I'm documenting my findings.

Summary of content

BERT embeddings

First embedding layer (Embedding table )
Last 4 layers summed up embeddings
Comparing the embeddings from the two different strategies
Sentence embeddings
Animated - how embedding values are changing as it passes through each of the 12 layers

BERT weights

Weights distribution for query, key, values
Weights distribution for other layers
Attention score vs token per attention head
Attention score vs token per layer
Observations

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
BERT		BERT
model		model
output/layer12viz		output/layer12viz
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformers Deep Dive

Summary of content

BERT embeddings

BERT weights

About

Uh oh!

Releases

Packages

Languages

SuchandraDatta/transformers_deep_dive

Folders and files

Latest commit

History

Repository files navigation

Transformers Deep Dive

Summary of content

BERT embeddings

BERT weights

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages