Use sortObjects in sortResults.
requested to merge github/fork/masterleinad/parallelize_sortAndDetermineBufferLayout_thrust into master
Created by: masterleinad
On top of #172, this gives using
- 60 MPI processes,
- 10^7 points/MPI process,
- 10^6 queries/MPI process,
- and a varying number of neighbors for the knn search,
- and CUDA
and the
distributed_tree_driver
benchmark.
neighbors | old | new |
---|---|---|
20 | 8.51e0 | 7.23e0 |
40 | 2.06e1 | 1.16e1 |
80 | 7.38e1 | 2.10e1 |
100 | 1.08e2 | 2.54e1 |
120 | 1.51e2 | 2.98e1 |
140 | 1.97e2 | 3.43e1 |
Kokkos::BinSort::create_permute_vector
was just ridiculously slow here.
sortObjects
currently requires us to copy keys
and we could possibly reuse scratch
but currently this doesn't show up in the profiling at all.