RLHF

An implementation of Reinforcement Learning from Human Feedback (RLHF) using Hugging Face's trl library.

🏆 Reward Model Fine-Tuning (`reward_train.ipynb`)

Model: distilroberta-base
Dataset: trl-lib/lm-human-preferences-descriptiveness (human-ranked text pairs)
Objective: Learns to score text based on alignment with human preferences for descriptiveness

Base Model: GPT-2
Reward Model: argilla/roberta-base-reward-model-falcon-dolly
Dataset: argilla/databricks-dolly-15k-curated-en (15K instruction-response pairs)
Objective: Improves response quality (e.g., coherence, descriptiveness) on instruction-following tasks

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
ppo_train.ipynb		ppo_train.ipynb
requirements.txt		requirements.txt
reward_train.ipynb		reward_train.ipynb