PyTorch Transformer
This repository contains a simple implementation of the Transformer model in PyTorch. The code is written from scratch and can be used for educational purposes or as a starting point for building more complex transformer-based models.
Table of Contents
- Dependencies
- Usage
- Example
- License
Dependencies
To use this code, you will need to have the following dependencies installed:
PyTorch
NumPy
You can install these dependencies using pip:
pip install torch numpy
Usage
The main component of the transformer code is the MultiHeadAttention module, which can be used as a building block for more complex models. To use the MultiHeadAttention module in your code, you can simply import it from the transformer.py file:
from transformer import MultiHeadAttention
The MultiHeadAttention module can then be instantiated and used in your code:
attn = MultiHeadAttention(d_model=512, num_heads=8)
output = attn(input_tensor)
You can adjust the d_model and num_heads parameters to match the size of your input tensor and the number of attention heads you want to use.
Example
To run a simple example of the transformer code, you can use the example.py script provided in this repository. This script generates a random input tensor and applies the MultiHeadAttention module to it:
python
import torch from transformer import MultiHeadAttention
input_tensor = torch.randn(32, 10, 512)
attn = MultiHeadAttention(d_model=512, num_heads=8)
output_tensor = attn(input_tensor)
print(output_tensor.shape)
This will output the shape of the output tensor, which should be (32, 10, 512). License