Releases · nvidia-cosmos/cosmos-rl

12 Dec 07:39

foreverlms

v0.3.8

ad97196

v0.3.8 Latest

Latest

What's Changed

fix: refactor of weight mapper no need unsplit map specification by @lfengad in #436
Remove unsupported fields from Rollout parallelism config. by @foreverlms in #437
fix: add support for reasoning vla / avla usage by @lfengad in #439
Disable DeepEP for architectures older than Hopper by @bastefaniak in #441
Fix n_local_experts computation in DeepseekV3 and Qwen3 MoE by @bastefaniak in #440
feat: off policy sequence masking by @xlu451 in #431
fix: slurm launch dp replica support by @lfengad in #443
fix: fix deepep usage due to synchronize issue by @lfengad in #445

New Contributors

@bastefaniak made their first contribution in #441

Full Changelog: v0.3.7...v0.3.8

Contributors

lfengad, foreverlms, and 2 other contributors

Assets 2

09 Dec 07:19

lfengad

v0.3.7

fe3d615

v0.3.7

What's Changed

fix dpsk hf convert by @yy-code-nv in #415
Fix: pass norm_topk_prob for qwen3_vl_moe and intern_vl by @kane-vln in #414
Fix named buffer init by @yy-code-nv in #411
Fix moe implementation by @yy-code-nv in #416
Customize build_model with extra hf_config_args by @kane-vln in #418
RFC: refactor trainer for better customization by @foreverlms in #412
rfc: colocated mode by @lfengad in #413
feat: full custom case example with readme by @lfengad in #421
Add hooks for SFT validation by @foreverlms in #419
feat: Refine for custom example by @lfengad in #422
Update deepseek weight mapping for GRPO(vllm >= 0.10.0) by @kane-vln in #410
feat: update non-text rollout cases interface handling. by @lfengad in #425
feat: unbiased kl estimate by @xlu451 in #423
feat: control of weight version in DAPO case. by @lfengad in #426
fix: data type specification fixed and refine by @lfengad in #429
fix qwen3 moe weight exporting by @foreverlms in #428
fix(controller): guard zero division when no policy replicas registered by @xlu451 in #435

Full Changelog: v0.3.6...v0.3.7

Contributors

lfengad, foreverlms, and 3 other contributors

Assets 2

27 Nov 03:25

lfengad

v0.3.6

490bf93

v0.3.6

What's Changed

fix: More for training strategy consistency and metrics demonstration by @lfengad in #356
Fix: qwen3_vl_moe encoder use FlashAttnMeta by @kane-vln in #358
Support TP for HFModel by @kane-vln in #355
fix: Fix regression for more metrics in validation case by @lfengad in #360
feat: Add post process for rollout generation in data packer by @lfengad in #361
feat: SFT training with DDP to load model at only master rank by @lfengad in #359
Support video input for qwen3-vl/hf vlm datapacker by @kane-vln in #365
Update tests for datapacker by @kane-vln in #367
Enable local dataset loading and fetching for Policy and Rollout. by @foreverlms in #354
feat: Decoupled loss for async RL by @lfengad in #368
Remove prompt_idxs which is not needed now. by @foreverlms in #371
Fix: add tp_slice_dim initialization in state dict conversion by @kane-vln in #372
[FRC] Couple tokenizer with data packer by @heslami in #311
Support Nemotron-Nano SFT by @kane-vln in #373
Support sequence packing for HFModel by @kane-vln in #369
feat: dapo case move rollout filter into rollout worker by @lfengad in #387
Add expandable segmentation for pytorch allocator by @yy-code-nv in #388
Add the deepep support for Qwen3-MoE models by @yufanhuangNV in #389
Fix: resolve version incompatibility between FA3 and TE by @kane-vln in #391
Add sanity check for parallelism by @foreverlms in #390
fix: Fix hf gradient checking by @lfengad in #394
Enable FP4 dynamic quantization of linear layers for policy training by @yufanhuangNV in #374
rfc: restructure of some common used logic in parallel map by @lfengad in #395
Fix: fp4 compatible with python env by @lfengad in #398
fix: qwen2.5 vl case execution fix by @lfengad in #399
RFC: Refactor rollout worker part by @lfengad in #396
fix: resume from ckpt of hf buffer handling by @lfengad in #401
Fix qwen2-5 modeling by @yy-code-nv in #404
fix: stop issue due to validation by @lfengad in #405
Fix qwen3-moe and qwen3-vl-moe safetensors export by @foreverlms in #406
Support allgather moe dispatcher by @kane-vln in #402
Force FSDP warp to ensure consistent mix-precision training behavior, fix qwen3-moe deepep bug by @yy-code-nv in #409

New Contributors

@yy-code-nv made their first contribution in #388
@yufanhuangNV made their first contribution in #389

Full Changelog: v0.3.5...v0.3.6

Contributors

heslami, lfengad, and 4 other contributors

Assets 2

29 Oct 12:10

lfengad

v0.3.5

1500fdc

v0.3.5

What's Changed

Pass video_kwargs for policy input by @Dinghow in #346
Support the case where position_ids is None for qwen-vl series by @kane-vln in #325
Using FA3 in policy for hopper platform. by @foreverlms in #313
Support qwen3-vl-moe grpo by @kane-vln in #345
Fix docker build that lack of psutil by @foreverlms in #350
[Fix] remove error raise when importing te by @foreverlms in #352
fix: Refine on-policy and metrics by @lfengad in #351

Full Changelog: v0.3.4...v0.3.5

Contributors

lfengad, foreverlms, and 2 other contributors

Assets 2

23 Oct 06:24

lfengad

v0.3.4

051ac44

v0.3.4

What's Changed

Support OAI-GPT-OSS by @foreverlms in #202
feat: support multi-turn rl and tool call by @jingxu9x in #197
fix: prompts payload from dataset fixed for general cases. by @lfengad in #285
refactor: use api_client replace request url by @jingxu9x in #265
Add cosine similarity check into context parallel test by @foreverlms in #289
fix: For pending weight sync cmds in rollout to do them all at once. by @lfengad in #291
fix: more cases compatible for the RLpayload format update by @lfengad in #293
fix: Refine command fetch filter in validation case. by @lfengad in #295
fix: Relax packing sequence test check by @lfengad in #299
Support cp for VLMs by @kane-vln in #296
Activation offload in policy by @foreverlms in #294
Support gpt-oss based internvl of SFT by @kane-vln in #290
Set p2r nccl group size default to 1. by @foreverlms in #308
Move reward calculation to rollout worker by @lfengad in #292
feat: custom sampler support by @lfengad in #309
feat: custom batch sampler support and batch data loader in RL. by @lfengad in #314
fix: add warning log for dataloader batch setting by @lfengad in #316
[Feat] Implement GSPO by @Bin-NV in #306
Optimize hfmodel loading by @kane-vln in #323
Support qwen3_vl_moe sft by @kane-vln in #315
Supporting pure text prompt for qwen2.5-vl by @zekunhao1995 in #326
Fix HF_HOME model path by @Dinghow in #328
Adjust samples generation for on-policy. by @foreverlms in #324
Cache model downloads in CI by @bddppq in #332
feat: update reference and optimizers periodically by @lfengad in #327
feat: More metrics collection during RL by @lfengad in #330
feat: lora alpha pattern by @xlu451 in #331
Fix: qwen3_vl_moe data packer renaming by @kane-vln in #333
HF way support of Qwen3-VL. by @foreverlms in #329
fix weight version calc in on-policy by @foreverlms in #339
feat: Refine step logic with optimization batch control by @lfengad in #338
Disable dual streams in act-offloading by @foreverlms in #341
feat: Reduce peak memory in P2R check process by @lfengad in #340
Fix CI failure by @foreverlms in #344
fix: fix unmatched token numbers in calculating token-mean loss by @lfengad in #342

New Contributors

@Bin-NV made their first contribution in #306
@zekunhao1995 made their first contribution in #326

Full Changelog: v0.3.3...v0.3.4

Contributors

bddppq, zekunhao1995, and 7 other contributors

Assets 2

15 Sep 05:13

lfengad

v0.3.3

8af2e90

v0.3.3

What's Changed

Fix DDP for SFTTrainer by @kane-vln in #239
Unify validation config by @Dinghow in #255
Support of passing custom arguments to custom dataset script. by @foreverlms in #253
fix: heartbeat opt for heavy cpu case. by @lfengad in #257
Refactor: reset named_buffers in load_hf_weights by @kane-vln in #252
fix:Remove transformer-engine dependence in requirement by @lfengad in #258
[cleanup] Remove the old Deepseek-V3 implemention by @heslami in #260
fix: reward filter in dynamic sampling + rollout outdate pause generation by @lfengad in #263
[Fix] Deepseek V3 GRPO bug fix by @heslami in #259
feat: Support epoch level save frequency by save_freq_in_epoch by @lfengad in #264
InternVL sft support by @kane-vln in #254
Revert "Support of passing custom arguments to custom dataset script.… by @gekurian in #267
Fix the bug introduced in PR #252 by @gekurian in #268
Support of passing custom arguments to custom dataset script by @foreverlms in #275
Fallback to hfmodel pass if build model fails by @kane-vln in #276
Fix: sync named_buffer for hfmodel in grpo mode by @kane-vln in #277
fix: cpu intensive situation aware for controller and reward by @lfengad in #279
fix: Lepton cross-node job sync for host preparation before start. by @lfengad in #278
feat: SFT validation dataset and packer specification support by @lfengad in #281

Full Changelog: v0.3.2...v0.3.3

Contributors

heslami, lfengad, and 4 other contributors

Assets 2

28 Aug 13:54

jcao-ai

v0.3.2

874ca24

v0.3.2

What's Changed

Use AutoModel in case architecture doesn't exist by @kane-vln in #236
[1/n] Support Deepseek V3 SFT by @heslami in #190
[3/n] Support Deepseek V3 GRPO / Deepseek R1 by @heslami in #240
fix: outdated import from leptonai by @xlu451 in #243
trtllm-pytorch as the rollout backend. by @foreverlms in #161
feat: lora for grpo by @xlu451 in #222
Fix: sync_model_vocab corner case by @kane-vln in #242
feat: Only do for trainable params in weight sync. by @lfengad in #238
feat: custom logger support by @lfengad in #245
fix: Refine logger to only specified in data packer script. by @lfengad in #246
feat: data type control in weight transfer. by @lfengad in #247
fix: Remove cosmos-rl dependency in launch_all.py in normal mode. by @lfengad in #248
feat: sequence packing in training for optimization by @lfengad in #211
fix: min_filter_prefix_tokens corner case by @jcao-ai in #251

New Contributors

@heslami made their first contribution in #190

Full Changelog: v0.3.1...v0.3.2

Contributors

jcao-ai, heslami, and 4 other contributors

Assets 2

18 Aug 01:41

jcao-ai

v0.3.1

e3d3ac7

v0.3.1

What's Changed

Optimization: Quick dataloader advancing after checkpoint resumed by @jcao-ai in #234
fix: GRPO ckpt resume by @jcao-ai in #235

Full Changelog: v0.3.0...v0.3.1

Contributors

jcao-ai

Assets 2

16 Aug 00:44

jcao-ai

v0.3.0

0cd8bbe

v0.3.0

What's Changed

fix: Refine slurm script by @lfengad in #223
fix: hostname setting problem for lepton case when cross node replica by @lfengad in #224
refactor hfmodel gradient_checkpoint by @kane-vln in #229
Liger kernel integration by @jcao-ai in #226
LoRA For HFModel by @kane-vln in #230
fix: step_** digit sorting during ckpt selection by @jcao-ai in #232

Full Changelog: v0.2.9...v0.3.0

Contributors

jcao-ai, lfengad, and kane-vln

Assets 2

14 Aug 04:39

jcao-ai

v0.2.9

3cdde90

v0.2.9

What's Changed

[LoRA] Support modules_to_save by @jcao-ai in #213
fix: exit issue when replica unregister to make total steps larger. by @lfengad in #214
minor fix: only report val_score if validation is enabled by @jcao-ai in #215
fix: use kaiming init for lora by @jcao-ai in #216
chore: pass use_rslora config by @xlu451 in #217
Refactor activation checkpointing by @jcao-ai in #218
HF VLMs Support by @kane-vln in #165
feat: add deterministic feature by @jcao-ai in #220
fix: slurm launch code root found issue. by @lfengad in #221

Full Changelog: v0.2.8...v0.2.9

Contributors

jcao-ai, lfengad, and 2 other contributors

Assets 2

Releases: nvidia-cosmos/cosmos-rl

v0.3.8

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.7

What's Changed

Contributors

Uh oh!

v0.3.6

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.5

What's Changed

Contributors

Uh oh!

v0.3.4

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.3

What's Changed

Contributors

Uh oh!

v0.3.2

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.1

What's Changed

Contributors

Uh oh!

v0.3.0

What's Changed

Contributors

Uh oh!

v0.2.9

What's Changed

Contributors

Uh oh!