Stage1 #115

Xuezhi-Liang · 2022-04-04T10:32:08Z

Notes:

The first step is to create the Context class and start to move the core AdaptDL mathematics into that class.
The AdaptiveDataLoader should get refactored to use the Context class for those AdaptDL logic to prevent duplication of logic.
But the changes to AdaptiveDataLoader are purely implementation changes, users of the class shouldn't need to see any difference.

aurickq · 2022-04-06T19:40:42Z

adaptdl/adaptdl/torch/context.py

+    """
+
+    def __init__(self, batch_size):
+        self._elastic = adaptdl.torch.data.AdaptiveDataLoaderHelper(batch_size)


Should make sure the AdaptiveDLContext class does not depend on the AdaptiveDataLoaderHelper class (or anything related to PyTorch).

aurickq · 2022-04-06T19:41:50Z

adaptdl/adaptdl/torch/context.py

+                self._elastic._state.accumulation_steps = accum_steps
+        self._elastic._state.current_local_bsz, self._elastic._state.accumulation_steps = \
+            adaptdl.collective.broadcast((self._elastic._state.current_local_bsz,
+                                          self._elastic._state.accumulation_steps))


Shouldn't do any cross-replica synchronization from a call to the AdaptiveDLContext class. The synchronization should happen outside, from the callsite.

aurickq · 2022-04-06T19:43:12Z

adaptdl/adaptdl/torch/context.py

+        return self._elastic.training
+
+    def to_tensorboard(self, writer, global_step, tag_prefix=""):
+        self._elastic.to_tensorboard(writer, global_step, tag_prefix)


Tensorboard isn't core AdaptDL logic so shouldn't be included in this class.

aurickq · 2022-04-06T19:44:11Z

adaptdl/adaptdl/torch/context.py

+import adaptdl.env
+from adaptdl.torch._metrics import get_goodput_fn
+import adaptdl.torch.data
+from adaptdl.torch.scaling_rules import ScalingRuleBase


Should make sure this module doesn't depend on anything from PyTorch

aurickq · 2022-04-06T19:44:32Z

adaptdl/adaptdl/torch/context.py

+import adaptdl.torch.data
+from adaptdl.torch.scaling_rules import ScalingRuleBase
+
+class AdaptiveDLContext(object):


Maybe rename it to just Context so usage of it can be simply adaptdl.Context

…etuum#118) * first conversion to v1 * disable CR pruning * add adaptdl SA * add status to schema * comply to ray-project/ray#21852

Xuezhi-Liang · 2022-05-31T12:43:58Z

close via a new PR #123

Xuezhi-Liang added 2 commits April 4, 2022 14:26

stage1

a946f7e

stage1

c5513f0

aurickq suggested changes Apr 6, 2022

View reviewed changes

odp and others added 11 commits April 8, 2022 13:23

handle default value of preemptible flag (petuum#116)

4d8afa5

Support apiextensions.k8s.io/v1 and admissionregistration.k8s.io/v1 (p…

16bc63c

…etuum#118) * first conversion to v1 * disable CR pruning * add adaptdl SA * add status to schema * comply to ray-project/ray#21852

fix adaptdl-ray release ver (petuum#119)

c0bb45f

stage1_1.5

995f910

Merge branch 'stage1' of github.com:Xuezhi-Liang/adaptdl into stage1

969d26c

stage1_1.5

6bd60b7

stage1_1.5

3d74699

Merge branch 'stage1' of github.com:Xuezhi-Liang/adaptdl into stage1

a0493ff

test

8d7e96d

Merge branch 'stage1' of github.com:Xuezhi-Liang/adaptdl into stage1

9c9d91d

test

64fb57b

Xuezhi-Liang closed this May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stage1 #115

Stage1 #115

Uh oh!

Xuezhi-Liang commented Apr 4, 2022

Uh oh!

aurickq Apr 6, 2022

Uh oh!

aurickq Apr 6, 2022

Uh oh!

aurickq Apr 6, 2022

Uh oh!

aurickq Apr 6, 2022

Uh oh!

aurickq Apr 6, 2022

Uh oh!

Xuezhi-Liang commented May 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Stage1 #115

Stage1 #115

Uh oh!

Conversation

Xuezhi-Liang commented Apr 4, 2022

Uh oh!

aurickq Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

aurickq Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

aurickq Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

aurickq Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

aurickq Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

Xuezhi-Liang commented May 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants