Exploit batched input in createFromSends
Created by: masterleinad
When calling createFromSends
in DistributedSearchTreeImpl::communicateResultsBack
the input is already batched, but we currently don't try to benefit from that. Given that even for examples that only require very few communication most of the time related to MPI communication is spent in createFromSends
(and for communication heavy applications even more so), it might be worth exploiting this.
The new overload for sortAndDetermienBufferLayout
essentials just computes the correct permutation for the batched ranks and reconstructs the total permutation from that.
I still need to run benchmarks.