From bca1c2167b5dfebd95e3530ec9bcd50d9ff693a9 Mon Sep 17 00:00:00 2001 From: chandanms Date: Fri, 15 Aug 2025 22:27:30 +0200 Subject: [PATCH] Accidently added the read me to wrong folder. Fixed now --- README.md | 22 ++++++------- simple_stories_train/README.md | 58 ---------------------------------- 2 files changed, 11 insertions(+), 69 deletions(-) delete mode 100644 simple_stories_train/README.md diff --git a/README.md b/README.md index cda30a6..24a4919 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,9 @@ # simple_stories_train -Project for training small LMs. Designed for training on SimpleStories, an extension of -[TinyStories](https://arxiv.org/abs/2305.07759). +Training framework for small language models using SimpleStories, a large-scale synthetic dataset of over 2 million short stories in simple language. - -- Training script is based on the efficeint [train_gpt2.py](https://github.com/karpathy/llm.c/blob/master/train_gpt2.py) in [llm.c](https://github.com/karpathy/llm.c) (licensed - under MIT ((c) 2024 Andrei Karpathy)) -- Some model architecture implementations are based on - [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) (licensed under - MIT ((c) 2022 TransformerLensOrg)). +**Paper:** [Parameterized Synthetic Text Generation with SimpleStories](https://arxiv.org/abs/2504.09184) +**Models & Dataset:** [🤗 SimpleStories on Hugging Face](https://huggingface.co/SimpleStories) ## Installation @@ -37,8 +32,8 @@ make test-all # Run all tests ## Usage ### Training a model -``` -python train_llama.py [PATH/TO/CONFIG.yaml] [--key1 value1 --key2 value2 ...] +```bash +python -m simple_stories_train.train [PATH/TO/CONFIG.yaml] [--key1 value1 --key2 value2 ...] ``` where - `PATH/TO/CONFIG.yaml` contains the training config. If no path is provided, a default config will be used. @@ -49,10 +44,15 @@ If running on CPU, you may need to set `--compile=False`. To run on multiple GPUs, use ``` -torchrun --standalone --nproc_per_node=N train_llama.py ... +torchrun --standalone --nproc_per_node=N -m simple_stories_train.train ... ``` where `N` is the number of GPUs to use. ### Logging with Weights & Biases To track training with Weights & Biases, you can set the WANDB_PROJECT and WANDB_API_KEY variables in `.env`. API keys can be obtained from your [Weights & Biases account settings](https://wandb.ai/settings). + +## Acknowledgments + +- Training script is based on the efficient [train_gpt2.py](https://github.com/karpathy/llm.c/blob/master/train_gpt2.py) in [llm.c](https://github.com/karpathy/llm.c) (licensed under MIT ((c) 2024 Andrej Karpathy)) +- Some model architecture implementations are based on [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) (licensed under MIT ((c) 2022 TransformerLensOrg)) diff --git a/simple_stories_train/README.md b/simple_stories_train/README.md deleted file mode 100644 index 24a4919..0000000 --- a/simple_stories_train/README.md +++ /dev/null @@ -1,58 +0,0 @@ -# simple_stories_train - -Training framework for small language models using SimpleStories, a large-scale synthetic dataset of over 2 million short stories in simple language. - -**Paper:** [Parameterized Synthetic Text Generation with SimpleStories](https://arxiv.org/abs/2504.09184) -**Models & Dataset:** [🤗 SimpleStories on Hugging Face](https://huggingface.co/SimpleStories) - -## Installation - -From the root of the repository, run one of - -```bash -make install-dev # To install the package, dev requirements and pre-commit hooks -make install # To just install the package (runs `pip install -e .`) -``` - -## Development - -Suggested extensions and settings for VSCode are provided in `.vscode/`. To use the suggested -settings, copy `.vscode/settings-example.json` to `.vscode/settings.json`. - -There are various `make` commands that may be helpful - -```bash -make check # Run pre-commit on all files (i.e. pyright, ruff linter, and ruff formatter) -make type # Run pyright on all files -make format # Run ruff linter and formatter on all files -make test # Run tests that aren't marked `slow` -make test-all # Run all tests -``` - -## Usage - -### Training a model -```bash -python -m simple_stories_train.train [PATH/TO/CONFIG.yaml] [--key1 value1 --key2 value2 ...] -``` -where -- `PATH/TO/CONFIG.yaml` contains the training config. If no path is provided, a default config will be used. -- `--key1 value1 --key2 value2 ...` override values in the config. Note that if you wish to update a - nested value, you must use dotted notation (e.g. `--train_dataset_config.name my_dataset`). - -If running on CPU, you may need to set `--compile=False`. - -To run on multiple GPUs, use -``` -torchrun --standalone --nproc_per_node=N -m simple_stories_train.train ... -``` -where `N` is the number of GPUs to use. - -### Logging with Weights & Biases -To track training with Weights & Biases, you can set the WANDB_PROJECT and WANDB_API_KEY variables in -`.env`. API keys can be obtained from your [Weights & Biases account settings](https://wandb.ai/settings). - -## Acknowledgments - -- Training script is based on the efficient [train_gpt2.py](https://github.com/karpathy/llm.c/blob/master/train_gpt2.py) in [llm.c](https://github.com/karpathy/llm.c) (licensed under MIT ((c) 2024 Andrej Karpathy)) -- Some model architecture implementations are based on [TransformerLens](https://github.com/TransformerLensOrg/TransformerLens) (licensed under MIT ((c) 2022 TransformerLensOrg))