MiniModel/README.md at main · xTimeCrystal/MiniModel

🛠️ Setup & Training

1. Install Dependencies

First, install the required packages:

pip install torchao liger_kernel pyarrow tensorboard

💡 Note: torchao and liger_kernel may require a recent version of PyTorch (≥2.3) and a CUDA-enabled environment for optimal performance.

2. Prepare Data

Download all files from this repository.
Place them in a single working directory.
Inside this directory, create a subfolder named 128.
Download the training data (Parquet files) into the 128/ folder:
🔗 TinyCorpus-v2

3. File Structure

Your directory should look like this:

your-training-folder/
├── trainGPT-token.py
├── fast_self_attn_model.py
├── data_utils.py
├── dev_optim.py
└── 128/
    ├── tinycorpus-000-of-128.parquet
    ├── tinycorpus-001-of-128.parquet
    └── ...                            # all shard files

4. Start Training

Run the training script from inside your-training-folder:

python trainGPT-token.py

This will replicate MiniModel with 12 layers, the original model used 24 layers. Please change 'layers': 24 in trainGPT-token.py if you wish to replicate the original model.

By default, the script logs training loss and other metrics to a directory called runs/ using PyTorch’s SummaryWriter.

5. Monitor Training with TensorBoard

While training is running (or after it finishes), launch TensorBoard to visualize the loss curve:

tensorboard --logdir=runs

Then open your browser and go to:
👉 http://localhost:6006

You’ll see a real-time plots of the training loss (refreshes every 30s).

6. Troubleshooting Out-of-Memory (OOM) Errors

If you encounter memory issues, open trainGPT-token.py and adjust one or both of the following:

Reduce model size:
```
'input_dims': 512   # default 768
```
Reduce batch size:
```
batch_size = 32     # default 64
```

Smaller values will lower VRAM usage at the cost of training speed or stability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🛠️ Setup & Training

1. Install Dependencies

2. Prepare Data

3. File Structure

4. Start Training

5. Monitor Training with TensorBoard

6. Troubleshooting Out-of-Memory (OOM) Errors

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🛠️ Setup & Training

1. Install Dependencies

2. Prepare Data

3. File Structure

4. Start Training

5. Monitor Training with TensorBoard

6. Troubleshooting Out-of-Memory (OOM) Errors