worker and endpoint cleanup

Currently we destroy the endpoint with

```
ucs_status_ptr_t ret = ucp_ep_close_nb(m_ep, UCP_EP_CLOSE_MODE_FORCE);
```
We have to use `UCP_EP_CLOSE_MODE_FORCE` instead of `UCP_EP_CLOSE_MODE_FLUSH` because of a cleanup glitch. Summary:  if one rank destroys the worker and the endpoint, and another rank then tries to destroy the endpoint with a FLUSH, it can to communicate with an already closed remote worker. This causes a segfault. `UCP_EP_CLOSE_MODE_FORCE` fixes the segfault, but the solution suggested by the developers was to use a barrier after endpoint destructor, only then close the workers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

worker and endpoint cleanup #118

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

worker and endpoint cleanup #118

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions