Skip to content

Repository review. #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: v0.0
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions rllm/data/dataset_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ class DatasetConfig:
dataloader_batch_size: int = 8

def __post_init__(self):
# @note : if the self.datasets is a string it goes through both if cases (not that problematic).
# Handle single string input
if isinstance(self.datasets, str):
self.datasets = [self.datasets]
Expand All @@ -75,6 +76,7 @@ def __post_init__(self):
if isinstance(self.datasets[0], str):
converted_datasets = []
for dataset_name in self.datasets:
# !critical : the comment bellow doesn't match the code created (missing code).
# Try to match with TrainDataset first, then TestDataset
try:
dataset = TrainDataset(dataset_name)
Expand Down
1 change: 1 addition & 0 deletions rllm/data/preprocess/difficulty_judge.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ def difficulty_fn(idx, entry):

def batch_difficulty(dataset: str, split: str):

# !critical : the two if cases do not work because the TrainDataset and TestDataset classes are not iterable.
# Figure out if we need a TrainDataset or TestDataset
if split == "train":
dataset_enum = TrainDataset[dataset.upper()]
Expand Down
1 change: 1 addition & 0 deletions rllm/rewards/reward_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ class RewardInput:
}
"""

# !critical : this class in never used.
@dataclass(slots=True, kw_only=True)
class LiveCodebenchInput:
"""Data structure for input required to calculate rewards.
Expand Down
6 changes: 3 additions & 3 deletions scripts/deepscaler/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Our 8k context script runs on a single node with 8 A100-80GB GPUs:
export VLLM_ATTENTION_BACKEND=XFORMERS
# Run 8K context length training
export MODEL_PATH="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
./scripts/[deepscaler|deepcoder]/train/run_deepscaler_1.5b_8k.sh --model $MODEL_PATH
./scripts/[deepscaler|deepcoder]/train/deepscaler_1.5b_8k.sh --model $MODEL_PATH
```

## Multi-Node Training (32 GPUs)
Expand All @@ -37,13 +37,13 @@ ray start --address=[RAY_ADDRESS]
3. Finally, on the head node, run the training script:
```bash
# Run 16K or 24K context length training
./scripts/train/run_deepscaler_1.5b_[16k|24k].sh --model [CHECKPOINT_PATH]
./scripts/train/deepscaler_1.5b_[16k|24k].sh --model [CHECKPOINT_PATH]
```
We welcome the community to try out different models, context legnths, and RL parameters in the training scripts!

### Ablations

Finally, we provide ablations for the 2k/4k context runs in `scripts/ablation/`. To run:
```bash
./scripts/ablation/run_deepscaler_1.5b_[2k|4k].sh --model [CHECKPOINT_PATH]
./scripts/ablation/deepscaler_1.5b_[2k|4k].sh --model [CHECKPOINT_PATH]
```