Skip to content

Releases: nvidia-cosmos/cosmos-rl

v0.3.8

12 Dec 07:39
ad97196

Choose a tag to compare

What's Changed

  • fix: refactor of weight mapper no need unsplit map specification by @lfengad in #436
  • Remove unsupported fields from Rollout parallelism config. by @foreverlms in #437
  • fix: add support for reasoning vla / avla usage by @lfengad in #439
  • Disable DeepEP for architectures older than Hopper by @bastefaniak in #441
  • Fix n_local_experts computation in DeepseekV3 and Qwen3 MoE by @bastefaniak in #440
  • feat: off policy sequence masking by @xlu451 in #431
  • fix: slurm launch dp replica support by @lfengad in #443
  • fix: fix deepep usage due to synchronize issue by @lfengad in #445

New Contributors

Full Changelog: v0.3.7...v0.3.8

v0.3.7

09 Dec 07:19
fe3d615

Choose a tag to compare

What's Changed

Full Changelog: v0.3.6...v0.3.7

v0.3.6

27 Nov 03:25
490bf93

Choose a tag to compare

What's Changed

  • fix: More for training strategy consistency and metrics demonstration by @lfengad in #356
  • Fix: qwen3_vl_moe encoder use FlashAttnMeta by @kane-vln in #358
  • Support TP for HFModel by @kane-vln in #355
  • fix: Fix regression for more metrics in validation case by @lfengad in #360
  • feat: Add post process for rollout generation in data packer by @lfengad in #361
  • feat: SFT training with DDP to load model at only master rank by @lfengad in #359
  • Support video input for qwen3-vl/hf vlm datapacker by @kane-vln in #365
  • Update tests for datapacker by @kane-vln in #367
  • Enable local dataset loading and fetching for Policy and Rollout. by @foreverlms in #354
  • feat: Decoupled loss for async RL by @lfengad in #368
  • Remove prompt_idxs which is not needed now. by @foreverlms in #371
  • Fix: add tp_slice_dim initialization in state dict conversion by @kane-vln in #372
  • [FRC] Couple tokenizer with data packer by @heslami in #311
  • Support Nemotron-Nano SFT by @kane-vln in #373
  • Support sequence packing for HFModel by @kane-vln in #369
  • feat: dapo case move rollout filter into rollout worker by @lfengad in #387
  • Add expandable segmentation for pytorch allocator by @yy-code-nv in #388
  • Add the deepep support for Qwen3-MoE models by @yufanhuangNV in #389
  • Fix: resolve version incompatibility between FA3 and TE  by @kane-vln in #391
  • Add sanity check for parallelism by @foreverlms in #390
  • fix: Fix hf gradient checking by @lfengad in #394
  • Enable FP4 dynamic quantization of linear layers for policy training by @yufanhuangNV in #374
  • rfc: restructure of some common used logic in parallel map by @lfengad in #395
  • Fix: fp4 compatible with python env by @lfengad in #398
  • fix: qwen2.5 vl case execution fix by @lfengad in #399
  • RFC: Refactor rollout worker part by @lfengad in #396
  • fix: resume from ckpt of hf buffer handling by @lfengad in #401
  • Fix qwen2-5 modeling by @yy-code-nv in #404
  • fix: stop issue due to validation by @lfengad in #405
  • Fix qwen3-moe and qwen3-vl-moe safetensors export by @foreverlms in #406
  • Support allgather moe dispatcher by @kane-vln in #402
  • Force FSDP warp to ensure consistent mix-precision training behavior, fix qwen3-moe deepep bug by @yy-code-nv in #409

New Contributors

Full Changelog: v0.3.5...v0.3.6

v0.3.5

29 Oct 12:10
1500fdc

Choose a tag to compare

What's Changed

Full Changelog: v0.3.4...v0.3.5

v0.3.4

23 Oct 06:24
051ac44

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.3...v0.3.4

v0.3.3

15 Sep 05:13
8af2e90

Choose a tag to compare

What's Changed

  • Fix DDP for SFTTrainer by @kane-vln in #239
  • Unify validation config by @Dinghow in #255
  • Support of passing custom arguments to custom dataset script. by @foreverlms in #253
  • fix: heartbeat opt for heavy cpu case. by @lfengad in #257
  • Refactor: reset named_buffers in load_hf_weights by @kane-vln in #252
  • fix:Remove transformer-engine dependence in requirement by @lfengad in #258
  • [cleanup] Remove the old Deepseek-V3 implemention by @heslami in #260
  • fix: reward filter in dynamic sampling + rollout outdate pause generation by @lfengad in #263
  • [Fix] Deepseek V3 GRPO bug fix by @heslami in #259
  • feat: Support epoch level save frequency by save_freq_in_epoch by @lfengad in #264
  • InternVL sft support by @kane-vln in #254
  • Revert "Support of passing custom arguments to custom dataset script.… by @gekurian in #267
  • Fix the bug introduced in PR #252 by @gekurian in #268
  • Support of passing custom arguments to custom dataset script by @foreverlms in #275
  • Fallback to hfmodel pass if build model fails by @kane-vln in #276
  • Fix: sync named_buffer for hfmodel in grpo mode by @kane-vln in #277
  • fix: cpu intensive situation aware for controller and reward by @lfengad in #279
  • fix: Lepton cross-node job sync for host preparation before start. by @lfengad in #278
  • feat: SFT validation dataset and packer specification support by @lfengad in #281

Full Changelog: v0.3.2...v0.3.3

v0.3.2

28 Aug 13:54
874ca24

Choose a tag to compare

What's Changed

  • Use AutoModel in case architecture doesn't exist by @kane-vln in #236
  • [1/n] Support Deepseek V3 SFT by @heslami in #190
  • [3/n] Support Deepseek V3 GRPO / Deepseek R1 by @heslami in #240
  • fix: outdated import from leptonai by @xlu451 in #243
  • trtllm-pytorch as the rollout backend. by @foreverlms in #161
  • feat: lora for grpo by @xlu451 in #222
  • Fix: sync_model_vocab corner case by @kane-vln in #242
  • feat: Only do for trainable params in weight sync. by @lfengad in #238
  • feat: custom logger support by @lfengad in #245
  • fix: Refine logger to only specified in data packer script. by @lfengad in #246
  • feat: data type control in weight transfer. by @lfengad in #247
  • fix: Remove cosmos-rl dependency in launch_all.py in normal mode. by @lfengad in #248
  • feat: sequence packing in training for optimization by @lfengad in #211
  • fix: min_filter_prefix_tokens corner case by @jcao-ai in #251

New Contributors

Full Changelog: v0.3.1...v0.3.2

v0.3.1

18 Aug 01:41
e3d3ac7

Choose a tag to compare

What's Changed

  • Optimization: Quick dataloader advancing after checkpoint resumed by @jcao-ai in #234
  • fix: GRPO ckpt resume by @jcao-ai in #235

Full Changelog: v0.3.0...v0.3.1

v0.3.0

16 Aug 00:44
0cd8bbe

Choose a tag to compare

What's Changed

Full Changelog: v0.2.9...v0.3.0

v0.2.9

14 Aug 04:39
3cdde90

Choose a tag to compare

What's Changed

Full Changelog: v0.2.8...v0.2.9