Improve sort

Created by: aprokop

Note that this is different from #60, as that one concerns only scaling.

Here are some results from TIOGA (with CudaUVM). Three variants:

  • Upstream master
  • Using unsigned int for size_type template parameter in Kokkos' BinSort [here]
  • Using Thrust [here] mesh1_setup mesh1_search mesh2_setup mesh2_search