Keep permutation on the device
Created by: masterleinad
The changes here should only affect a CUDA-aware MPI implementation.
- If the
DeviceType
's memory space isKokkos::Host
we, of course, do everything in host memory space. - If the
DeviceType
's memory space isKokkos::Cuda
and we enableKOKKOS_USE_CUDA_AWARE_MPI
, the permutation array now is stored on the device andDistributor::doPostsAndWaits
doesn't allow memory on the device anymore, but just uses the permutation array directly since the input array is stored on the device. - If the
DeviceType
's memory space isKokkos::Cuda
andKOKKOS_USE_CUDA_AWARE_MPI
is disabled, the input array forDistributor::doPostsAndWaits
is stored on the host so we should also create the permutation array on the host.