RoMaLM

RoMaLM (Rotary + RMSNorm Language Model) is a modern decoder-only transformer built from scratch in Python.
It uses Rotary Position Embeddings (RoPE) and RMSNorm for stability and efficiency. This is a minimal yet powerful architecture inspired by today’s LLMs and the current trends.

Features

Rotary Position Embeddings (RoPE)
RMSNorm (no LayerNorm)
PreNorm residual connections
SwiGLU-style feedforward: SiLU(x1) * x2
Tied token embeddings and output head
Optional LayerDrop for regularization
Causal attention via scaled_dot_product_attention

Model Overview

model = ModernTransformerLM(
    vocab_size=vocab_size,
    max_seq=max_seq_len,
    pad_token_id=pad_token_id,
    dim=768,
    n_heads=12,
    num_layers=8,
    ff_dim=3072,
    dropout=0.1,
    attn_dropout=0.1,
    layer_drop=0.0
)

This is just an example. Adjust hyperparameters as needed for your setup.

License

MIT License — do whatever just don’t claim you wrote it if you didn’t.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Architecture.py		Architecture.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoMaLM

Features

Model Overview

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoMaLM

Features

Model Overview

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages