Skip to content

Gradient Accumulation is wrong #316

@qZhang88

Description

@qZhang88

loss = loss/accumulation_steps

when accumulated_samples < 128, loss is cleared in next iteration by self.optimizer.zero_grad()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions