Skip to content

Parallelize sortAndDetermineBufferLayout

Created by: masterleinad

In combination with the CUDA-aware MPI pull request (#162), we should also be able to avoid copying permutation_indices to the CPU.

Merge request reports