Releases: nvidia-cosmos/cosmos-rl
Releases · nvidia-cosmos/cosmos-rl
v0.3.8
What's Changed
- fix: refactor of weight mapper no need unsplit map specification by @lfengad in #436
- Remove unsupported fields from Rollout parallelism config. by @foreverlms in #437
- fix: add support for reasoning vla / avla usage by @lfengad in #439
- Disable DeepEP for architectures older than Hopper by @bastefaniak in #441
- Fix n_local_experts computation in DeepseekV3 and Qwen3 MoE by @bastefaniak in #440
- feat: off policy sequence masking by @xlu451 in #431
- fix: slurm launch dp replica support by @lfengad in #443
- fix: fix deepep usage due to synchronize issue by @lfengad in #445
New Contributors
- @bastefaniak made their first contribution in #441
Full Changelog: v0.3.7...v0.3.8
v0.3.7
What's Changed
- fix dpsk hf convert by @yy-code-nv in #415
- Fix: pass norm_topk_prob for qwen3_vl_moe and intern_vl by @kane-vln in #414
- Fix named buffer init by @yy-code-nv in #411
- Fix moe implementation by @yy-code-nv in #416
- Customize build_model with extra hf_config_args by @kane-vln in #418
- RFC: refactor trainer for better customization by @foreverlms in #412
- rfc: colocated mode by @lfengad in #413
- feat: full custom case example with readme by @lfengad in #421
- Add hooks for SFT validation by @foreverlms in #419
- feat: Refine for custom example by @lfengad in #422
- Update deepseek weight mapping for GRPO(vllm >= 0.10.0) by @kane-vln in #410
- feat: update non-text rollout cases interface handling. by @lfengad in #425
- feat: unbiased kl estimate by @xlu451 in #423
- feat: control of weight version in DAPO case. by @lfengad in #426
- fix: data type specification fixed and refine by @lfengad in #429
- fix qwen3 moe weight exporting by @foreverlms in #428
- fix(controller): guard zero division when no policy replicas registered by @xlu451 in #435
Full Changelog: v0.3.6...v0.3.7
v0.3.6
What's Changed
- fix: More for training strategy consistency and metrics demonstration by @lfengad in #356
- Fix: qwen3_vl_moe encoder use FlashAttnMeta by @kane-vln in #358
- Support TP for HFModel by @kane-vln in #355
- fix: Fix regression for more metrics in validation case by @lfengad in #360
- feat: Add post process for rollout generation in data packer by @lfengad in #361
- feat: SFT training with DDP to load model at only master rank by @lfengad in #359
- Support video input for qwen3-vl/hf vlm datapacker by @kane-vln in #365
- Update tests for datapacker by @kane-vln in #367
- Enable local dataset loading and fetching for Policy and Rollout. by @foreverlms in #354
- feat: Decoupled loss for async RL by @lfengad in #368
- Remove prompt_idxs which is not needed now. by @foreverlms in #371
- Fix: add tp_slice_dim initialization in state dict conversion by @kane-vln in #372
- [FRC] Couple tokenizer with data packer by @heslami in #311
- Support Nemotron-Nano SFT by @kane-vln in #373
- Support sequence packing for HFModel by @kane-vln in #369
- feat: dapo case move rollout filter into rollout worker by @lfengad in #387
- Add expandable segmentation for pytorch allocator by @yy-code-nv in #388
- Add the deepep support for Qwen3-MoE models by @yufanhuangNV in #389
- Fix: resolve version incompatibility between FA3 and TE by @kane-vln in #391
- Add sanity check for parallelism by @foreverlms in #390
- fix: Fix hf gradient checking by @lfengad in #394
- Enable FP4 dynamic quantization of linear layers for policy training by @yufanhuangNV in #374
- rfc: restructure of some common used logic in parallel map by @lfengad in #395
- Fix: fp4 compatible with python env by @lfengad in #398
- fix: qwen2.5 vl case execution fix by @lfengad in #399
- RFC: Refactor rollout worker part by @lfengad in #396
- fix: resume from ckpt of hf buffer handling by @lfengad in #401
- Fix qwen2-5 modeling by @yy-code-nv in #404
- fix: stop issue due to validation by @lfengad in #405
- Fix qwen3-moe and qwen3-vl-moe safetensors export by @foreverlms in #406
- Support allgather moe dispatcher by @kane-vln in #402
- Force FSDP warp to ensure consistent mix-precision training behavior, fix qwen3-moe deepep bug by @yy-code-nv in #409
New Contributors
- @yy-code-nv made their first contribution in #388
- @yufanhuangNV made their first contribution in #389
Full Changelog: v0.3.5...v0.3.6
v0.3.5
What's Changed
- Pass video_kwargs for policy input by @Dinghow in #346
- Support the case where position_ids is None for qwen-vl series by @kane-vln in #325
- Using FA3 in policy for hopper platform. by @foreverlms in #313
- Support qwen3-vl-moe grpo by @kane-vln in #345
- Fix docker build that lack of
psutilby @foreverlms in #350 - [Fix] remove error raise when importing te by @foreverlms in #352
- fix: Refine on-policy and metrics by @lfengad in #351
Full Changelog: v0.3.4...v0.3.5
v0.3.4
What's Changed
- Support OAI-GPT-OSS by @foreverlms in #202
- feat: support multi-turn rl and tool call by @jingxu9x in #197
- fix: prompts payload from dataset fixed for general cases. by @lfengad in #285
- refactor: use api_client replace request url by @jingxu9x in #265
- Add cosine similarity check into context parallel test by @foreverlms in #289
- fix: For pending weight sync cmds in rollout to do them all at once. by @lfengad in #291
- fix: more cases compatible for the RLpayload format update by @lfengad in #293
- fix: Refine command fetch filter in validation case. by @lfengad in #295
- fix: Relax packing sequence test check by @lfengad in #299
- Support cp for VLMs by @kane-vln in #296
- Activation offload in policy by @foreverlms in #294
- Support gpt-oss based internvl of SFT by @kane-vln in #290
- Set p2r nccl group size default to 1. by @foreverlms in #308
- Move reward calculation to rollout worker by @lfengad in #292
- feat: custom sampler support by @lfengad in #309
- feat: custom batch sampler support and batch data loader in RL. by @lfengad in #314
- fix: add warning log for dataloader batch setting by @lfengad in #316
- [Feat] Implement GSPO by @Bin-NV in #306
- Optimize hfmodel loading by @kane-vln in #323
- Support qwen3_vl_moe sft by @kane-vln in #315
- Supporting pure text prompt for qwen2.5-vl by @zekunhao1995 in #326
- Fix HF_HOME model path by @Dinghow in #328
- Adjust samples generation for on-policy. by @foreverlms in #324
- Cache model downloads in CI by @bddppq in #332
- feat: update reference and optimizers periodically by @lfengad in #327
- feat: More metrics collection during RL by @lfengad in #330
- feat: lora alpha pattern by @xlu451 in #331
- Fix: qwen3_vl_moe data packer renaming by @kane-vln in #333
- HF way support of Qwen3-VL. by @foreverlms in #329
- fix weight version calc in on-policy by @foreverlms in #339
- feat: Refine step logic with optimization batch control by @lfengad in #338
- Disable dual streams in act-offloading by @foreverlms in #341
- feat: Reduce peak memory in P2R check process by @lfengad in #340
- Fix CI failure by @foreverlms in #344
- fix: fix unmatched token numbers in calculating token-mean loss by @lfengad in #342
New Contributors
- @Bin-NV made their first contribution in #306
- @zekunhao1995 made their first contribution in #326
Full Changelog: v0.3.3...v0.3.4
v0.3.3
What's Changed
- Fix DDP for SFTTrainer by @kane-vln in #239
- Unify validation config by @Dinghow in #255
- Support of passing custom arguments to custom dataset script. by @foreverlms in #253
- fix: heartbeat opt for heavy cpu case. by @lfengad in #257
- Refactor: reset named_buffers in load_hf_weights by @kane-vln in #252
- fix:Remove transformer-engine dependence in requirement by @lfengad in #258
- [cleanup] Remove the old Deepseek-V3 implemention by @heslami in #260
- fix: reward filter in dynamic sampling + rollout outdate pause generation by @lfengad in #263
- [Fix] Deepseek V3 GRPO bug fix by @heslami in #259
- feat: Support epoch level save frequency by
save_freq_in_epochby @lfengad in #264 - InternVL sft support by @kane-vln in #254
- Revert "Support of passing custom arguments to custom dataset script.… by @gekurian in #267
- Fix the bug introduced in PR #252 by @gekurian in #268
- Support of passing custom arguments to custom dataset script by @foreverlms in #275
- Fallback to hfmodel pass if build model fails by @kane-vln in #276
- Fix: sync named_buffer for hfmodel in grpo mode by @kane-vln in #277
- fix: cpu intensive situation aware for controller and reward by @lfengad in #279
- fix: Lepton cross-node job sync for host preparation before start. by @lfengad in #278
- feat: SFT validation dataset and packer specification support by @lfengad in #281
Full Changelog: v0.3.2...v0.3.3
v0.3.2
What's Changed
- Use AutoModel in case architecture doesn't exist by @kane-vln in #236
- [1/n] Support Deepseek V3 SFT by @heslami in #190
- [3/n] Support Deepseek V3 GRPO / Deepseek R1 by @heslami in #240
- fix: outdated import from leptonai by @xlu451 in #243
- trtllm-pytorch as the rollout backend. by @foreverlms in #161
- feat: lora for grpo by @xlu451 in #222
- Fix: sync_model_vocab corner case by @kane-vln in #242
- feat: Only do for trainable params in weight sync. by @lfengad in #238
- feat: custom logger support by @lfengad in #245
- fix: Refine logger to only specified in data packer script. by @lfengad in #246
- feat: data type control in weight transfer. by @lfengad in #247
- fix: Remove cosmos-rl dependency in launch_all.py in normal mode. by @lfengad in #248
- feat: sequence packing in training for optimization by @lfengad in #211
- fix: min_filter_prefix_tokens corner case by @jcao-ai in #251
New Contributors
Full Changelog: v0.3.1...v0.3.2
v0.3.1
v0.3.0
What's Changed
- fix: Refine slurm script by @lfengad in #223
- fix: hostname setting problem for lepton case when cross node replica by @lfengad in #224
- refactor hfmodel gradient_checkpoint by @kane-vln in #229
- Liger kernel integration by @jcao-ai in #226
- LoRA For HFModel by @kane-vln in #230
- fix:
step_**digit sorting during ckpt selection by @jcao-ai in #232
Full Changelog: v0.2.9...v0.3.0
v0.2.9
What's Changed
- [LoRA] Support
modules_to_saveby @jcao-ai in #213 - fix: exit issue when replica unregister to make total steps larger. by @lfengad in #214
- minor fix: only report val_score if validation is enabled by @jcao-ai in #215
- fix: use kaiming init for lora by @jcao-ai in #216
- chore: pass use_rslora config by @xlu451 in #217
- Refactor activation checkpointing by @jcao-ai in #218
- HF VLMs Support by @kane-vln in #165
- feat: add
deterministicfeature by @jcao-ai in #220 - fix: slurm launch code root found issue. by @lfengad in #221
Full Changelog: v0.2.8...v0.2.9