Performance regression

Created by: aprokop

We seem to have introduced significant performance regression in the execution space refactoring for Cuda (maybe others, have not checked). Comparing 0fbcb17b with 8e272698:

$ compare_bench.py benchmarks 202004141712_0fbcb17.json 202004141709_8e27269.json | grep median
BM_construction<ArborX::BVH<Cuda>>/10000/0/manual_time_median                                    -0.0204         -0.0150          1608          1575          1597          1573
BM_construction<ArborX::BVH<Cuda>>/100000/0/manual_time_median                                   -0.2722         -0.2394          3184          2317          3563          2710
BM_construction<ArborX::BVH<Cuda>>/1000000/0/manual_time_median                                  +0.2564         +0.2191          7952          9991          8432         10279
BM_construction<ArborX::BVH<Cuda>>/10000/1/manual_time_median                                    -0.0452         -0.0316          1662          1587          1652          1600
BM_construction<ArborX::BVH<Cuda>>/100000/1/manual_time_median                                   -0.2217         -0.1972          3289          2560          3676          2951
BM_construction<ArborX::BVH<Cuda>>/1000000/1/manual_time_median                                  +0.3687         +0.3340          8321         11390          8810         11753
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/1/0/2/manual_time_median                         -0.0006         -0.0006          1444          1444          1534          1533
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/1/0/2/manual_time_median                       -0.0005         +0.0039         10598         10593         11011         11054
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/1/0/2/manual_time_median                     +0.0004         +0.0002         96161         96202         97028         97043
BM_knn_search<ArborX::BVH<Cuda>>/10000/10000/10/1/1/3/manual_time_median                         -0.0015         -0.0015          1605          1603          1694          1692
BM_knn_search<ArborX::BVH<Cuda>>/100000/100000/10/1/1/3/manual_time_median                       -0.0003         +0.0001         13720         13716         14166         14168
BM_knn_search<ArborX::BVH<Cuda>>/1000000/1000000/10/1/1/3/manual_time_median                     -0.0001         -0.0002        197086        197066        197942        197894
BM_radius_search<ArborX::BVH<Cuda>>/10000/10000/10/1/0/0/2/manual_time_median                    -0.0019         -0.0020          1105          1103          1195          1192
BM_radius_search<ArborX::BVH<Cuda>>/100000/100000/10/1/0/0/2/manual_time_median                  +0.0299         +0.0259          7098          7310          7506          7700
BM_radius_search<ArborX::BVH<Cuda>>/1000000/1000000/10/1/0/0/2/manual_time_median                +0.0152         +0.0144         60885         61812         61748         62636
BM_radius_search<ArborX::BVH<Cuda>>/10000/10000/10/1/0/1/3/manual_time_median                    -0.0009         -0.0008          1067          1066          1157          1156
BM_radius_search<ArborX::BVH<Cuda>>/100000/100000/10/1/0/1/3/manual_time_median                  +0.0067         -0.0007          3769          3794          4228          4225
BM_radius_search<ArborX::BVH<Cuda>>/1000000/1000000/10/1/0/1/3/manual_time_median                +0.0034         +0.0030         10351         10386         11189         11223

It's all in construction. I observed even worse for HACC data, where it's of larger size (36M).

P.S. This also reminds me of a similar problem from #242.