The current allreduce implementation for the CPU algorithm uses AllGather as a hack, but this is inefficient. We need an efficient allreduce implementation in legate, or a way to hack something more efficient together in legateboost. 
The current allreduce implementation for the CPU algorithm uses AllGather as a hack, but this is inefficient.
We need an efficient allreduce implementation in legate, or a way to hack something more efficient together in legateboost.