Skip to content

Exploring what's under the hood of transformer-based architectures and documenting my findings.

Notifications You must be signed in to change notification settings

SuchandraDatta/transformers_deep_dive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformers Deep Dive

Hello world 👋
I often spend my weekends exploring what transformer-based architectures are doing under the hood and in this repo I'm documenting my findings.

Summary of content

    BERT embeddings

  • First embedding layer (Embedding table )
  • Last 4 layers summed up embeddings
  • Comparing the embeddings from the two different strategies
  • Sentence embeddings
  • Animated - how embedding values are changing as it passes through each of the 12 layers

  • BERT weights

  • Weights distribution for query, key, values
  • Weights distribution for other layers
  • Attention score vs token per attention head
  • Attention score vs token per layer
  • Observations

About

Exploring what's under the hood of transformer-based architectures and documenting my findings.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published