Commit 61f8d06
Allow ports to be reused in gloo (#97677)
Summary:
X-link: pytorch/pytorch#97677
Pull Request resolved: #353
ProcessGroupGloo and gloo seem to be opening and closing sockets without allowing the port to be reused. We see this issue pop up in larger training jobs "Address already in use" and we assume it to be because all the ephemeral ports are exhausted.
This diff allows ports to be reused, we see a reduced number of ports being in `TIME_WAIT` state.
context: https://fb.workplace.com/groups/319878845696681/permalink/5988899781205532/
another issue: https://fb.workplace.com/groups/319878845696681/permalink/958768178474408/
Differential Revision: D44029927
fbshipit-source-id: 4a1483d0eceda01ffd02c7747282129f7f4a2efe1 parent 56b221c commit 61f8d06
1 file changed
+9
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
104 | 113 | | |
105 | 114 | | |
106 | 115 | | |
| |||
0 commit comments