Integration #54

DachengLi1 · 2020-12-08T17:56:42Z

No description provided.

autodist/autodist.py

autodist/resource_spec.py

autodist/cluster.py

autodist/const.py

zhisbug · 2020-12-28T00:52:14Z

@YLJALDC please address the comments above?

DachengLi1 · 2021-01-03T05:02:51Z

I have addressed the comments. Do you have any further suggestions?

zhisbug · 2021-01-04T07:55:56Z

I have addressed the comments. Do you have any further suggestions?

Will review tomorrow and see whether more changes are needed.

jessezbj · 2021-01-06T20:04:29Z

autodist/autodist.py

+            # At AdaptDL mode, when the worker pass through this before
+            # the chief has created the strategy, this should returns
+            # nothing. Later, when the chief has created the strategy,
+            # it can load it.


Still not quite sure about the purpose of load and what this comment means. In L162-L163 load is always true when IS_ADAPTDL is true. Could you explain more?

it is kind of subtle. Previously Autodist chief run first and generate the strategy; it will spawn worker instances after it builds the strategy, setup the cluster, etc. Now every instance will run through _build, and thus call _build_or_load_strategy. The first time the worker gets None from this function. The second time the worker will get the strategy from the chief. This is because kubernetes launch instances parallelly. The second time when the worker call the load, it is guaranteed that the chief has already generates it because there are several collective calls in between, which is blocking.

jessezbj · 2021-01-06T20:07:27Z

autodist/autodist.py

+                self._coordinator.launch_clients()
+        else:
+            if IS_AUTODIST_CHIEF:
+                self._coordinator = Coordinator(strategy=strategy, cluster=self._cluster)


Would it be better if we create different Coordinator classes based on the cluster mode?

Good suggestion. I tried similar format like you suggest. But I think the current version is more readable in autodist.py though more lengthy. Its easy to maintain this way since autodist.py is the first file to look at.

jessezbj · 2021-01-06T20:11:17Z

autodist/cluster.py

+        if IS_ADAPTDL:
+            hostname = socket.gethostname()
+            local_ip = socket.gethostbyname(hostname)
+            return local_ip 


Since there is already a class named ADAPTDLCluster inherited from Cluster, is it necessary to insert ADAPTDL related code in the base class?

Thanks for pointing this out. I have updated it to the ADAPTDLCluster Class. Thanks!

DachengLi1 added 8 commits November 14, 2020 22:00

example file

1720324

scratch

954cc80

testing

4b55c2d

finished launching

09058d3

added from adaptdl

a912b40

add back original functionality

e472cde

minor fix

3b3530b

tested single machine, testing distributed original.

39a15c9

DachengLi1 requested a review from pengwu22 December 8, 2020 18:19

pengwu22 suggested changes Dec 23, 2020

View reviewed changes

jgada assigned DachengLi1 Dec 29, 2020

DachengLi1 added 2 commits December 31, 2020 20:26

addressed comments

0fcb5bb

turn off proxy

89ec0a8

DachengLi1 requested a review from pengwu22 January 3, 2021 05:02

jessezbj reviewed Jan 6, 2021

View reviewed changes

DachengLi1 added 4 commits January 11, 2021 21:33

update hostip

6d7cf33

update staticmethod

0d0ce41

lint

a2305b1

avoid chief build twice

65f3649

Integration #54

Are you sure you want to change the base?

Integration #54

Uh oh!

Conversation

DachengLi1 commented Dec 8, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhisbug commented Dec 28, 2020

Uh oh!

DachengLi1 commented Jan 3, 2021

Uh oh!

zhisbug commented Jan 4, 2021

Uh oh!

jessezbj Jan 6, 2021

Choose a reason for hiding this comment

Uh oh!

DachengLi1 Jan 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jessezbj Jan 6, 2021

Choose a reason for hiding this comment

Uh oh!

DachengLi1 Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

jessezbj Jan 6, 2021

Choose a reason for hiding this comment

Uh oh!

DachengLi1 Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DachengLi1 Jan 11, 2021 •

edited

Loading