CUDA+MPI fixes