Commit 1842b4f
Add DPO support for DeepSpeed-Chat (#828)
* Add label_smoothing while calculating step2 DPO loss in DeepSpeed-Chat.
* Add training scripts for step2 DPO in DeepSpeed-Chat.
* Remove unused packages and format the code of step2 DPO in DeepSpeed-Chat.
* Update training scripts of step2 DPO in DeepSpeed-Chat.
* Follow upstream fixes.
* Update README.md for Step2 DPO finetuning.
* Add opt 350M training log demo for step 2 dpo finetuning in DeepSpeed-Chat.
* Address the formatting issue in step2 dpo finetuning in DeepSpeed-Chat.
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>1 parent 476f600 commit 1842b4f
File tree
12 files changed
+7216
-0
lines changed- applications/DeepSpeed-Chat/training/step2_dpo_finetuning
- training_log_output
- training_scripts
- llama2
- opt
- multi_node
- single_gpu
- single_node
- sweep
12 files changed
+7216
-0
lines changedLines changed: 26 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
0 commit comments