(torchx/specs) Allow roles to specify their own workspaces #1139

kiukchung · 2025-10-03T18:57:54Z

Summary:
So far you can only specify the workspace in the runner's API:

runner.dryrun(appdef, cfg, workspace=...)

in which case the workspace applies to ONLY role[0].

This behavior was intentional since multi-role usecases of TorchX typically had a single "main" role that the application owner actually owned and the other roles were prepackaged apps (not part of your project).

This is no longer the case with applications such as reenforcement learning where the project encompasses multiple applications (e.g. trainer, generator, etc) therefore we need a more flexible way to specify a workspace per Role.

For BC this I'm maintaining the following behavior:

If workspace is specified as a runner argument then it takes precedence over role[0].workspace
Non-zero roles (e.g. role[1], role[2], ...) are unaffected by the workspace argument. That is their workspace attributes (e.g. role[1].workspace) are respected as is.
"disabling" workspace (e.g. passing workspace=None from the runner argument) can still build a workspace if the role's workspace attribute is not None.

NOTE: we need to do further optimization for cases where multiple roles have the same "image" and "workspace". In this case we only need to build the image+workspace once. But as it stands we end up building a separate ephemeral per role (even if the ephemeral is the SAME across all the roles). This isn't an issue practically since image builders like Docker are content addressed and caches layers.

Reviewed By: AbishekS

Differential Revision: D83793199

Summary: So far you can only specify the workspace in the runner's API: ``` runner.dryrun(appdef, cfg, workspace=...) ``` in which case the workspace applies to ONLY `role[0]`. This behavior was intentional since multi-role usecases of TorchX typically had a single "main" role that the application owner actually owned and the other roles were prepackaged apps (not part of your project). This is no longer the case with applications such as reenforcement learning where the project encompasses multiple applications (e.g. trainer, generator, etc) therefore we need a more flexible way to specify a workspace per Role. For BC this I'm maintaining the following behavior: 1. If `workspace` is specified as a runner argument then it takes precedence over `role[0].workspace` 2. Non-zero roles (e.g. `role[1], role[2], ...`) are unaffected by the workspace argument. That is their workspace attributes (e.g. `role[1].workspace`) are respected as is. 3. "disabling" workspace (e.g. passing `workspace=None` from the runner argument) can still build a workspace if the role's workspace attribute is not `None`. NOTE: we need to do further optimization for cases where multiple roles have the same "image" and "workspace". In this case we only need to build the image+workspace once. But as it stands we end up building a separate ephemeral per role (even if the ephemeral is the SAME across all the roles). This isn't an issue practically since image builders like Docker are content addressed and caches layers. Reviewed By: AbishekS Differential Revision: D83793199

facebook-github-bot · 2025-10-03T18:58:02Z

@kiukchung has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83793199.

facebook-github-bot · 2025-10-03T18:58:05Z

@kiukchung has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83793199.

codecov-commenter · 2025-10-03T19:01:22Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.64%. Comparing base (1e3df20) to head (b748dcd).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1139      +/-   ##
==========================================
+ Coverage   91.60%   91.64%   +0.04%     
==========================================
  Files          83       83              
  Lines        6431     6441      +10     
==========================================
+ Hits         5891     5903      +12     
+ Misses        540      538       -2

Flag	Coverage Δ
unittests	`91.64% <100.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 3, 2025

facebook-github-bot added fb-exported meta-exported labels Oct 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(torchx/specs) Allow roles to specify their own workspaces #1139

(torchx/specs) Allow roles to specify their own workspaces #1139

Uh oh!

kiukchung commented Oct 3, 2025

Uh oh!

facebook-github-bot commented Oct 3, 2025

Uh oh!

facebook-github-bot commented Oct 3, 2025

Uh oh!

codecov-commenter commented Oct 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

(torchx/specs) Allow roles to specify their own workspaces #1139

Are you sure you want to change the base?

(torchx/specs) Allow roles to specify their own workspaces #1139

Uh oh!

Conversation

kiukchung commented Oct 3, 2025

Uh oh!

facebook-github-bot commented Oct 3, 2025

Uh oh!

facebook-github-bot commented Oct 3, 2025

Uh oh!

codecov-commenter commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

codecov-commenter commented Oct 3, 2025 •

edited

Loading