Skip to content

Conversation

@apoorva5ingh
Copy link

I created a mini GPT model from scratch using PyTorch, inspired by Karpathy’s educational examples. This project implements all core components of a transformer: multi-head self-attention, feedforward layers, embeddings, and layer normalization. The model is trained on character-level text data and can generate new sequences after training. It includes logic for evaluation, loss tracking, and saving/loading the model. The code is clean and modular, making it perfect for learning how GPT models work internally. This setup is great for experimenting with custom datasets or building lightweight LLMs for small-scale tasks and educational purposes.

I created a mini GPT model from scratch using PyTorch, inspired by Karpathy’s educational examples. This project implements all core components of a transformer: multi-head self-attention, feedforward layers, embeddings, and layer normalization. The model is trained on character-level text data and can generate new sequences after training. It includes logic for evaluation, loss tracking, and saving/loading the model. The code is clean and modular, making it perfect for learning how GPT models work internally. This setup is great for experimenting with custom datasets or building lightweight LLMs for small-scale tasks and educational purposes.
@B4xAbhishek
Copy link

needs improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants