Parallelize sortAndDetermineBufferLayout
Created by: masterleinad
In combination with the CUDA-aware
MPI pull request (#162), we should also be able to avoid copying permutation_indices
to the CPU.
Created by: masterleinad
In combination with the CUDA-aware
MPI pull request (#162), we should also be able to avoid copying permutation_indices
to the CPU.