Skip to content

Comments

Parameter to optionally override default tokenizer path#5

Open
coleramos425 wants to merge 1 commit intoanthonix:masterfrom
coleramos425:param-tokenizer
Open

Parameter to optionally override default tokenizer path#5
coleramos425 wants to merge 1 commit intoanthonix:masterfrom
coleramos425:param-tokenizer

Conversation

@coleramos425
Copy link

I've found that when running llm.c out of directory, following the instructions provided in README:

pip install -r requirements.txt
python dev/data/tinyshakespeare.py
python train_gpt2.py
make train_gpt2amd
./train_gpt2amd

I get the following error:

---
WARNING: Failed to open the tokenizer file gpt2_tokenizer.bin
The Tokenizer is a new feature added April 14 2024.
Re-run `python train_gpt2.py` to write it
---

I needed to add a parameter to the train_gpt2.cu file to allow user to specify path to tokenizer file. Running with this change fixed the issue for me.

Signed-off-by: coleramos425 <colramos@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant