-
Notifications
You must be signed in to change notification settings - Fork 34
Description
I am trying to implement the task-bench benchmark on top of a data flow paradigm (TTG) and found it to be a rather frustrating exercise, mostly due to the total lack of documentation. In the task_graph_t exactly 3 out of 11 fields have some kind of documentation and only 2 of the 12 task_graph_* functions are blessed with a comment, one of which is a FIXME... For me, it really comes down to a guessing game and trying to read other implementations to figure out whether my implementation is actually correct in the sense of the benchmark.
Particular questions I still don't know the answer to are:
- What is
nb_fieldsand what is the connection totimesteps? It appears thatnb_fieldsis set to the number of timesteps if not explicitly provided. Why? What does the number of fields have to do with the number of timesteps? Can I ignore it and just use1? - What is the relation between the
task_graph_[reverse]_dependenciesandtask_graph_offset_at_timestep/task_graph_width_at_timestep? Do I have to apply a correction for the offset/width to the dependencies provided bytask_graph_[reverse]_dependencies? If so, why is that not done intask_graph_[reverse]_dependencies? - I couldn't find any run rules. For a data-flow model like TTG it is possible to feed data into the graph and have updates never materialize outside of it (i.e., to never write back updated data). In dependency-based models such as StarPU, PaRSEC, OpenMP, etc dependencies are described directly on physical memory locations, which is very different from dataflow-based models. So what is the expectation here? Also, can I add artificial dependencies if that benefits my model (say between points
xin consecutive timesteps)? What is the correctness metric? Am I free to distribute the data however I want? What is the minimum number of dependencies I need to support? (clearly, the models mentioned above couldn't support the alltoall pattern at scale and chose a seemingly arbitrary number of 10)
Given the state of things, I will go ahead and choose the interpretation most favorable for my model that won't crash or error out during execution. However, that should not be the way to write a benchmark meant to be portable...