CUDA-aware MPI
Created by: masterleinad
I still need to clean this up. In particular, the preprocessor variables are not going to stay, but will be replaced by a CMake option. So far, I was testing this with
- 60 MPI processes,
- 10^7 points/MPI process,
- 10^6 queries/MPI process,
- and a varying number of neighbors for the knn search,
- and CUDA
using the distributed_tree_driver benchmark.
| neighbors | old | new |
|---|---|---|
| 10 | 2.4e0 | 3.7e0 |
| 20 | 4.9e0 | 5.8e0 |
| 40 | 1.0e1 | 1.0e1 |
| 80 | 2.2e1 | 2.0e1 |
| 100 | 2.8e1 | 2.5e1 |
| 120 | 3.3e1 | 3.0e1 |
| 140 | 4.0e1 | 3.4e1 |
| 160 | 4.6e1 | 3.9e1 |