CUDA-aware MPI
Created by: masterleinad
I still need to clean this up. In particular, the preprocessor variables are not going to stay, but will be replaced by a CMake
option. So far, I was testing this with
- 60 MPI processes,
- 10^7 points/MPI process,
- 10^6 queries/MPI process,
- and a varying number of neighbors for the knn search,
- and CUDA
using the distributed_tree_driver
benchmark.
neighbors | old | new |
---|---|---|
10 | 2.4e0 | 3.7e0 |
20 | 4.9e0 | 5.8e0 |
40 | 1.0e1 | 1.0e1 |
80 | 2.2e1 | 2.0e1 |
100 | 2.8e1 | 2.5e1 |
120 | 3.3e1 | 3.0e1 |
140 | 4.0e1 | 3.4e1 |
160 | 4.6e1 | 3.9e1 |