LLaMA-Text-Generation-with-RLHF

Fine-tune a LLaMA model using Reinforcement Learning with Human Feedback (RLHF) for aligned text generation

Features

Install dependencies: pip install -r requirements.txt
Prepare dataset: Place your JSON data in the dataset/ folder
Run training notebooks: jupyter notebook actor.ipynb jupyter notebook critic.ipynb jupyter notebook rlhf.ipynb
Test generation: jupyter notebook test.ipynb

• Support larger LLaMA models

• Improve reward model and RLHF strategy

• Optimize training and generation speed

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
dataset		dataset
tokenizer		tokenizer
README.md		README.md
actor.ipynb		actor.ipynb
critic.ipynb		critic.ipynb
model.ipynb		model.ipynb
model_gemma2.ipynb		model_gemma2.ipynb
rlhf.ipynb		rlhf.ipynb
test.ipynb		test.ipynb
util.py		util.py