Commit 112b4f8
committed
btl/uct: complete re-work of the BTL
This commit is large and contains the following changes:
- Disconnect the connection memory domain from the communication domain. This
allows any memory domain to be used for connections. The default is to use
tcp but it can be disabled which will allow UD and others to be used.
- Move tl attributes off od the tl context structure. In theory tl
attributes do not differ betweeen contexts so query them once when the tl
is created not once per context. This removes the need to allocate the
first context so that code has also been removed.
- Change mca_btl_uct_tl_t uct_dev_contexts member to be an array. The btl
always allocates the maximum number of contexts. This is not a significant
amount of memory. Rather than reduce it to be based on the configured
maximum number of contexts it makes sense to just make it an array and
remove the extra indirection when accessing the contexts.
- Do not call mca_btl_uct_endpoint_set_flag before sending a message on the
connection endpoint. This method may cause the release of the connection
endpoint (cached on the BTL endpoint). If this happens it would lead to a
SEGV.
- Flush the endpoint only when it is being released. There is no need to do so
on every send. Releasing the endpoint without flushing it may lead to it
being destroyed while still processing data.
- Downgrade endpoint lock from recursive. Recursive locks are not needed for
the endpoint lock.
- Move the async context from the module to the tl. There is no real benefit
from sharing the async context between tls. Given this and some other changes
that will be made it makes sense to move it from the module to the tl.
- Connection TLs are only used to form connections for connect-to-endpoint TLs.
They do not need to belong to the same memory domain as the one they are used
with so there is no need to rely on a BTL module. Moved the
pending_connection_reqs to the tl and changes the code to support a NULL
module for the connection tl.
- Put active tls in a list on the mca_btl_uct_md_t structure This simplifies
the code a bit by moving mca_btl_uct_tl_t ownership to the mca_btl_uct_md_t
class.
- There is an issue with btl/uct which prevents the usage of the standard
btl_uct_ MCA variables (eager limit, flags, etc). Because of the way the btl
was written these values are all determined directly from UCT and can not be
changed using the MCA variable interface. To address this issue this commit
breaks apart the initialization code and separates out the pieces that are
necessary for discovery only. The discovery pieces now use a new set of
variables that include the memory domain name and directly control the
behavior for BTLs on that memory domain as well as enabling the usage of the
btl_uct variable to control the defaults for these variables.
Example, using memory domain irdma0 will create variables:
btl_uct_irdma0_eager_limit, btl_uct_irdma0_max_send_size, etc.
The defaults will be based on what is reported by UCT and the user can set
the values to a subset of what UCT reports. For example, if the max send size
for the hardware is 8192B then it can be set to anything up to and including
that value. The same is true for feature flags, if the hardware supports only
some btl atomics or operations the user can specify a subset of them (others
will be ignored).
- Move device context code to a new file. There is a specific header for device
contexts so it makes sense to move the context-specific code to a matching C
file. No changes in this other than moving code around.
- Use uct_ep_am_short_iov for short messages. The uct_ep_am_short_iov method
should allow for faster short messages than uct_ep_am_short (which can only
take a single buffer). This commit moves btl/uct to the newer method which
breaks compatibility with some version of UCT. Since we already no longer
support those versions this change is safe.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>1 parent 3a2e908 commit 112b4f8
File tree
19 files changed
+1669
-956
lines changed- opal/mca/btl/uct
19 files changed
+1669
-956
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
16 | 17 | | |
17 | 18 | | |
18 | 19 | | |
| |||
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | | - | |
| 28 | + | |
28 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
29 | 41 | | |
30 | 42 | | |
31 | | - | |
32 | 43 | | |
33 | | - | |
34 | 44 | | |
35 | 45 | | |
36 | | - | |
37 | 46 | | |
38 | | - | |
39 | 47 | | |
40 | 48 | | |
41 | | - | |
42 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
43 | 53 | | |
44 | 54 | | |
45 | 55 | | |
| |||
50 | 60 | | |
51 | 61 | | |
52 | 62 | | |
| 63 | + | |
53 | 64 | | |
54 | 65 | | |
55 | 66 | | |
| 67 | + | |
56 | 68 | | |
57 | 69 | | |
58 | 70 | | |
59 | 71 | | |
60 | 72 | | |
61 | 73 | | |
62 | | - | |
| 74 | + | |
63 | 75 | | |
64 | 76 | | |
65 | 77 | | |
66 | 78 | | |
67 | | - | |
| 79 | + | |
68 | 80 | | |
69 | 81 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
67 | 70 | | |
68 | 71 | | |
69 | 72 | | |
| |||
76 | 79 | | |
77 | 80 | | |
78 | 81 | | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | 88 | | |
99 | 89 | | |
100 | 90 | | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | 91 | | |
105 | 92 | | |
106 | 93 | | |
| |||
119 | 106 | | |
120 | 107 | | |
121 | 108 | | |
122 | | - | |
123 | | - | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
124 | 112 | | |
125 | 113 | | |
126 | 114 | | |
| |||
133 | 121 | | |
134 | 122 | | |
135 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
136 | 127 | | |
137 | 128 | | |
138 | 129 | | |
| |||
141 | 132 | | |
142 | 133 | | |
143 | 134 | | |
| 135 | + | |
144 | 136 | | |
145 | 137 | | |
146 | 138 | | |
147 | 139 | | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
148 | 144 | | |
149 | 145 | | |
150 | 146 | | |
| |||
158 | 154 | | |
159 | 155 | | |
160 | 156 | | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
161 | 168 | | |
162 | 169 | | |
163 | 170 | | |
| |||
293 | 300 | | |
294 | 301 | | |
295 | 302 | | |
296 | | - | |
297 | | - | |
| 303 | + | |
298 | 304 | | |
299 | 305 | | |
300 | 306 | | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
301 | 313 | | |
302 | 314 | | |
303 | 315 | | |
304 | 316 | | |
305 | 317 | | |
306 | 318 | | |
307 | 319 | | |
308 | | - | |
| 320 | + | |
309 | 321 | | |
310 | 322 | | |
311 | 323 | | |
| |||
315 | 327 | | |
316 | 328 | | |
317 | 329 | | |
318 | | - | |
| 330 | + | |
319 | 331 | | |
320 | 332 | | |
321 | 333 | | |
| |||
326 | 338 | | |
327 | 339 | | |
328 | 340 | | |
329 | | - | |
| 341 | + | |
330 | 342 | | |
331 | 343 | | |
332 | 344 | | |
| |||
338 | 350 | | |
339 | 351 | | |
340 | 352 | | |
341 | | - | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
342 | 358 | | |
343 | 359 | | |
344 | 360 | | |
| |||
0 commit comments