Problems encountered when using distributed training

Hi, thank you very much for the code. I recently tried to train this model, but I encountered some problems when enabling distributed training. When I tried to run this train_torch.py ​​file using the python command, I got the error KeyError: 'LOCAL_RANK'. There seems to be no LOCAL_RANK RANK WORLD_SIZE in my environment. I can only enable one GPU when I add them manually. Did you set some additional parameters to enable distributed training when training this model? Or do you have any insights into the problem I encountered? Thank you very much for your time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems encountered when using distributed training #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problems encountered when using distributed training #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions