An implementation of Reinforcement Learning from Human Feedback (RLHF) using Hugging Face's trl library.
- Model:
distilroberta-base - Dataset:
trl-lib/lm-human-preferences-descriptiveness(human-ranked text pairs) - Objective: Learns to score text based on alignment with human preferences for descriptiveness
- Base Model: GPT-2
- Reward Model:
argilla/roberta-base-reward-model-falcon-dolly - Dataset:
argilla/databricks-dolly-15k-curated-en(15K instruction-response pairs) - Objective: Improves response quality (e.g., coherence, descriptiveness) on instruction-following tasks
