Enquiries about Parameter Sharding

Hello there, I have been reading your research paper "Decoupled Model Schedule for Deep Learning Training". In particular, in part (3) tensor parallelism, it is mentioned that "Since the output tensor only holds partial results after sharding, we need to conduct **all_reduce** to aggregate outputs
from different device". May I know which part of the source code at this repository is performing the **all_reduce** operation? Right now I am taking a look at build.py, and I believe that the captured code below is handling the sharded parameters and making a split when a shard is added by the user. But I am not exactly sure where the all_reduce operation will be done? Any help will be appreciated. Thank you.  

```
# Only keep the partition for this device for sharded params.
        tp_rank = sch.rank
        cnt_shard = 0
        for param_name, param in sch.mod.named_parameters(recurse=False):
            is_found = False
            for idx, new_size in enumerate(new_param_shapes[param_name]):
                if new_size != param.shape[idx]:
                    assert not is_found, "Cannot have two sharded dimensions!"
                    sharded_size = new_size
                    axis = idx
                    is_found = True
            if is_found:
                cnt_shard += 1
                sharded_param = param.detach().split(sharded_size, dim=axis)[tp_rank]
                sharded_param = sharded_param.contiguous()
                new_param = nn.Parameter(sharded_param)
                sch.mod.register_parameter(param_name, new_param)
                transfor_param_tags(sch, param, new_param)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enquiries about Parameter Sharding #98

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enquiries about Parameter Sharding #98

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions