An interactive, tokenizer playground to explore how text breaks into tokens, how IDs are assigned using an external vocabulary file
- Type or paste text to see tokens in real time
- Token → ID mapping with color-coded token types:
- 🟩 Green → existing vocab word
- 🟥 Red → newly learned word
- 🟦 Blue → UTF-8 raw byte (symbols)
- 🟧 Orange → punctuation/symbol
- ⬜ Grey → unknown token ID
- Decode by entering space- or comma-separated token IDs
- Legend for quick type reference
- Custom tokenizer logic — no external libs
- External vocab support — load
vocab.jsonfor consistent tokenization
- HTML + CSS
- JavaScript
- External
vocab.jsonfor token mapping
git clone https://github.com/dev-d-25/Tokenizer.git
cd Tokenizer