(torchx/specs) Allow roles to specify their own workspaces #1139
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
So far you can only specify the workspace in the runner's API:
in which case the workspace applies to ONLY
role[0]
.This behavior was intentional since multi-role usecases of TorchX typically had a single "main" role that the application owner actually owned and the other roles were prepackaged apps (not part of your project).
This is no longer the case with applications such as reenforcement learning where the project encompasses multiple applications (e.g. trainer, generator, etc) therefore we need a more flexible way to specify a workspace per Role.
For BC this I'm maintaining the following behavior:
If
workspace
is specified as a runner argument then it takes precedence overrole[0].workspace
Non-zero roles (e.g.
role[1], role[2], ...
) are unaffected by the workspace argument. That is their workspace attributes (e.g.role[1].workspace
) are respected as is."disabling" workspace (e.g. passing
workspace=None
from the runner argument) can still build a workspace if the role's workspace attribute is notNone
.NOTE: we need to do further optimization for cases where multiple roles have the same "image" and "workspace". In this case we only need to build the image+workspace once. But as it stands we end up building a separate ephemeral per role (even if the ephemeral is the SAME across all the roles). This isn't an issue practically since image builders like Docker are content addressed and caches layers.
Reviewed By: AbishekS
Differential Revision: D83793199