-
Notifications
You must be signed in to change notification settings - Fork 115
Open
Description
This is in reference to this function in runtime.py
def num_iterations(self, loader_size):
""" Determines number of iterations for this stage
TODO: don't currently support uneven configurations.
"""
if self.stage == 0 or self.stage is None:
return loader_size
num_iterations = loader_size * self.num_ranks_in_first_stage
assert num_iterations % self.num_ranks_in_stage == 0
num_iterations = num_iterations // self.num_ranks_in_stage
return num_iterations
From my understanding, the total number of batches in the dataset should be a multiple of the layer replication factor for all layers except the first one for this function to not throw an assertion error. However, there is no guarantee that the optimizer module of pipedream will assign replication factors so that they follow this constraint as well. As a result, sometimes the framework is unable to execute training because of this limitation.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels