vesuvius: add guided Dinovol modes, pixelshuffle pretrained decoder, and MedNeXt architectures#806
vesuvius: add guided Dinovol modes, pixelshuffle pretrained decoder, and MedNeXt architectures#806giorgioangel wants to merge 59 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
Follow-up fix pushed in
Validation rerun on the clean PR branch:
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 941daa8ed2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if self.guide_freeze: | ||
| with torch.inference_mode(): | ||
| frozen_features = self.guide_backbone(x)[0] | ||
| return frozen_features.clone() | ||
| return self.guide_backbone(x)[0] |
There was a problem hiding this comment.
Fix non-frozen guide feature path
When guide_freeze is false, this function skips the if block so frozen_features is never assigned, but return frozen_features.clone() still runs unconditionally. That raises UnboundLocalError on the first guided forward pass and makes trainable guidance unusable. The following return self.guide_backbone(x)[0] is unreachable, so the intended non-frozen branch currently never executes.
Useful? React with 👍 / 👎.
| scaler.step(optimizer) | ||
| scaler.update() | ||
| optimizer_stepped = True | ||
| if should_time_optimizer: |
There was a problem hiding this comment.
Update EMA weights after optimizer step
EMA support is wired into config/loading, but after scaler.step(optimizer) there is no call to _update_ema_model(model) (and no other call site in BaseTrainer). In runs with ema_enabled: true, the EMA copy stays at initialization and never tracks training weights, so any EMA validation/checkpoint flow silently uses stale parameters.
Useful? React with 👍 / 👎.
Summary
Adds guided volumetric Dinovol segmentation features, a frozen-backbone PixelShuffle decoder, and MedNeXt v1/v2 architectures to
vesuvius, while keeping training and inference routed through the existing CLI,NetworkFromConfig, and checkpoint loader paths.Included
pretrained_backboneprotection from globalInitWeights_Hemodel_config.pretrained_decoder_type: pixelshuffle_convmodel_config.architecture_type: mednext_v1andmednext_v2AdamWepsilon supportFinal retained behavior
Conv -> PixelShuffle -> Conv -> GroupNorm -> GELU3x3x3 -> GroupNorm -> GELU -> 3x3x3 -> 1x1x1 logitsmednext_v2is implemented as a paper-derived extension over vendored MedNeXt v1, with explicit preset selection viamednext_model_idValidation
uv run --extra models --extra tests pytest tests/models/build/test_guided_network.py tests/models/build/test_mednext_shapes.py tests/models/build/test_primus_shapes.py tests/models/training/test_guided_trainer.py tests/models/training/test_mednext_trainer.py tests/models/training/test_base_trainer.py tests/models/configuration/test_config_manager.py tests/models/configuration/test_ps256_config_compat.py -q139 passedBenchmark snapshots
Guided Dinovol benchmark on the clean PR branch:
32^3: baseline train step61.21 ms; direct segmentation11.57 ms; feature encoder14.52 ms; skip concat23.18 ms; input gating62.65 ms64^3: baseline train step16.35 ms; direct segmentation9.68 ms; feature encoder16.92 ms; skip concat24.12 ms; input gating19.32 msMedNeXt benchmark on the clean PR branch:
128^3: UNet train step118.62 ms;mednext_v1 B248.90 ms;mednext_v2 L1160.38 ms128^3:mednext_v2 L width2forward runs but train-step OOMs;mednext_v2 Bstartup OOMs on the local RTX 4090192^3: only the UNet baseline remains trainable locally; the current MedNeXt variants OOM in this setupCaveats
mednext_v2as upstream-official nnUNet code; it is a paper-derived extension over vendored MedNeXt v1notes.mdandimplementation.mdwere used to reconcile retained behavior vs reverted experiments, but they are outside thevillagit repo and are not part of this PR