Fine-tune a LLaMA model using Reinforcement Learning with Human Feedback (RLHF) for aligned text generation
-
Actor-critic architecture with a reward model
-
Generates high-quality, human-aligned responses
-
Install dependencies: pip install -r requirements.txt
-
Prepare dataset: Place your JSON data in the dataset/ folder
-
Run training notebooks: jupyter notebook actor.ipynb jupyter notebook critic.ipynb jupyter notebook rlhf.ipynb
-
Test generation: jupyter notebook test.ipynb
• Support larger LLaMA models
• Improve reward model and RLHF strategy
• Optimize training and generation speed