Keep permutation on the device
Created by: masterleinad
The changes here should only affect a CUDA-aware MPI implementation.
- If the
DeviceType's memory space isKokkos::Hostwe, of course, do everything in host memory space. - If the
DeviceType's memory space isKokkos::Cudaand we enableKOKKOS_USE_CUDA_AWARE_MPI, the permutation array now is stored on the device andDistributor::doPostsAndWaitsdoesn't allow memory on the device anymore, but just uses the permutation array directly since the input array is stored on the device. - If the
DeviceType's memory space isKokkos::CudaandKOKKOS_USE_CUDA_AWARE_MPIis disabled, the input array forDistributor::doPostsAndWaitsis stored on the host so we should also create the permutation array on the host.