Skip to content

vrizz/rlhf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

RLHF

wandb badge

rlhf

An implementation of Reinforcement Learning from Human Feedback (RLHF) using Hugging Face's trl library.

🏆 Reward Model Fine-Tuning (reward_train.ipynb)

🦾 PPO Fine-Tuning (ppo_train.ipynb)

About

Reinforcement learning from human feedback (RLHF): reward model & PPO fine-tuning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors