Skip to content

JXRepo/LLaMA-Text-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLaMA-Text-Generation-with-RLHF

Fine-tune a LLaMA model using Reinforcement Learning with Human Feedback (RLHF) for aligned text generation

Features

  • Actor-critic architecture with a reward model

  • Generates high-quality, human-aligned responses

Usage

  1. Install dependencies: pip install -r requirements.txt

  2. Prepare dataset: Place your JSON data in the dataset/ folder

  3. Run training notebooks: jupyter notebook actor.ipynb jupyter notebook critic.ipynb jupyter notebook rlhf.ipynb

  4. Test generation: jupyter notebook test.ipynb

Future Improvements:

• Support larger LLaMA models

• Improve reward model and RLHF strategy

• Optimize training and generation speed

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors