Skip to content

CUDA-aware MPI

Created by: masterleinad

I still need to clean this up. In particular, the preprocessor variables are not going to stay, but will be replaced by a CMake option. So far, I was testing this with

  • 60 MPI processes,
  • 10^7 points/MPI process,
  • 10^6 queries/MPI process,
  • and a varying number of neighbors for the knn search,
  • and CUDA

using the distributed_tree_driver benchmark.

neighbors old new
10 2.4e0 3.7e0
20 4.9e0 5.8e0
40 1.0e1 1.0e1
80 2.2e1 2.0e1
100 2.8e1 2.5e1
120 3.3e1 3.0e1
140 4.0e1 3.4e1
160 4.6e1 3.9e1

Merge request reports