Skip to content

Add AUTO in PTG which dynamically allocate suitable buffer to receive…#759

Draft
QingleiCao wants to merge 2 commits intoICLDisco:masterfrom
QingleiCao:qinglei/variable_size_receive_buffer
Draft

Add AUTO in PTG which dynamically allocate suitable buffer to receive…#759
QingleiCao wants to merge 2 commits intoICLDisco:masterfrom
QingleiCao:qinglei/variable_size_receive_buffer

Conversation

@QingleiCao
Copy link
Contributor

… data

@QingleiCao QingleiCao requested a review from a team as a code owner February 23, 2026 02:21
@QingleiCao QingleiCao marked this pull request as draft February 23, 2026 02:21
Copy link
Contributor

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you are trying to do here, and I think it is a good step in the right direction. However, I have two issues with the proposed approach:

  1. it loses the type of the incoming data. You are forced to drop the type and rollback to receive data as packed (potentially breaking heterogeneity). I understand that the arena must be set to AUTO but I think in that case the remote_type should be set to the expected type of the data instead of allowing it to be void*.
  2. I am really not sure about the way you compute the device_index ? Without running the code, I would think you should mostly get 0, as the task's inputs are not set during the get_datatype function.

copy->device_private = NULL;
}

static void remote_dep_auto_gpu_copy_release(parsec_data_copy_t *copy, int device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is confusing as it implies you are releasing the data copy but instead you are just releasing the private memory owned by it.

Why do you need the device ? A coipy is always associated with a device, and it would not make sense to release the memory of a different copy than the one you pass as argument.

output->data.remote.dst_count = remote_size;
output->data.remote.src_displ = 0;
output->data.remote.dst_displ = 0;
output->data.remote.device_index = remote_dep_auto_pick_gpu_device(newcontext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a proper accessor for the task class to identify the device it wants to use. You should be calling the newcontext->taskclass.data_affinity

}
parsec_task_t tmp = *task;
tmp.selected_device = NULL;
if( PARSEC_SUCCESS != parsec_select_best_device(&tmp) ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this returns what you expect ? For me parsec_select_best_device only works once the task is properly initialized and has all it's input data copies properly set. Before that it will basically count the possible incarnations and returns the ownership of any of the copies. In other words it is mostly ransom or maybe always zero if the input copies are not yet set (which is the most likely).

@QingleiCao
Copy link
Contributor Author

I understand what you are trying to do here, and I think it is a good step in the right direction. However, I have two issues with the proposed approach:

  1. it loses the type of the incoming data. You are forced to drop the type and rollback to receive data as packed (potentially breaking heterogeneity). I understand that the arena must be set to AUTO but I think in that case the remote_type should be set to the expected type of the data instead of allowing it to be void*.
  2. I am really not sure about the way you compute the device_index ? Without running the code, I would think you should mostly get 0, as the task's inputs are not set during the get_datatype function.

Thank you for your comments @bosilca. This is still a draft, and I’ll be making updates intermittently. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants