Skip to content

Conversation

@jld-adriano
Copy link

This pull request contains changes generated by Cursor background composer.

Co-authored-by: adriano <adriano@exa.ai>
devin-ai-integration bot added a commit that referenced this pull request Dec 12, 2025
…nv vars

This fixes bug #1 where overriding nproc_per_node via with_overrides would not
change the number of processes for single-node elastic tasks.

For single-node elastic tasks (task_type='python-task'), the _execute method
reads PET_NPROC_PER_NODE, PET_NNODES, PET_MAX_RESTARTS, and PET_MONITOR_INTERVAL
from environment variables. However, these env vars were never being set in the
task template during serialization.

The fix adds an environment property override to PytorchElasticFunctionTask that
includes the elastic config as environment variables. This ensures that when
task_config is modified via with_overrides, the elastic configuration is
correctly passed to the pod via environment variables.

Combined with the previous fix (dynamic task_type property), this now fully
supports:
- Bug #1: single-node (1 proc) -> single-node (multiple procs) override
- Bug #2: single-node (1 proc) -> multi-node (multiple procs) override

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants