Skip to content

Handling uneven number of batches per replicated instance of a layer #59

@siddharth9820

Description

@siddharth9820

This is in reference to this function in runtime.py

def num_iterations(self, loader_size):
        """ Determines number of iterations for this stage
        TODO: don't currently support uneven configurations.
        """
        if self.stage == 0 or self.stage is None:
            return loader_size

        num_iterations = loader_size * self.num_ranks_in_first_stage
        assert num_iterations % self.num_ranks_in_stage == 0
        num_iterations = num_iterations // self.num_ranks_in_stage

        return num_iterations

From my understanding, the total number of batches in the dataset should be a multiple of the layer replication factor for all layers except the first one for this function to not throw an assertion error. However, there is no guarantee that the optimizer module of pipedream will assign replication factors so that they follow this constraint as well. As a result, sometimes the framework is unable to execute training because of this limitation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions