ArborX issueshttps://code.ornl.gov/6da/ArborX/-/issues2020-05-11T15:53:08Zhttps://code.ornl.gov/6da/ArborX/-/issues/301Warnings coming from host access example2020-05-11T15:53:08ZArndt, DanielWarnings coming from host access example*Created by: aprokop*
Starting from
```
../examples/access_traits/example_host_access_traits.cpp(30): warning: calling a __host__ function from a __host__ __device__ function is not allowed
```
These are not detected in CI due to...*Created by: aprokop*
Starting from
```
../examples/access_traits/example_host_access_traits.cpp(30): warning: calling a __host__ function from a __host__ __device__ function is not allowed
```
These are not detected in CI due to #272.
https://code.ornl.gov/6da/ArborX/-/issues/292Arborx::Sphere support + different kind of geometry for nodes/leaves2020-04-29T16:14:32ZArndt, DanielArborx::Sphere support + different kind of geometry for nodes/leaves*Created by: JulienLoiseau*
I am working on SPH simulations using ArborX.
This requires to use particles represented by a sphere and the interactions are between sphere-sphere.
For now in ArborX, the following is not currently poss...*Created by: JulienLoiseau*
I am working on SPH simulations using ArborX.
This requires to use particles represented by a sphere and the interactions are between sphere-sphere.
For now in ArborX, the following is not currently possible:
```
std::vector<ArborX::Sphere> ents;
ArborX::BVH<Kokkos::HostSpace> bvh{Kokkos::DefaultHostExecutionSpace{},
ents};
```
The type must decay to ArborX::Box or ArborX::Point.
My understanding is the whole tree structure is based on the same geometry (nodes+leaves) and boxes are way faster for the searches than spheres.
In my case all my objects are spheres, so I would need to transform them into boxes and then refine the search afterwise to see the ones falling into this sphere-sphere interaction.
Would it be possible to:
- Add support for sphere natively in the tree (sphere-sphere interaction might increase the number of interactions during the traversal but are fast to compute)
- Have different types of geometry for the nodes and the leaves in the tree? They could also be both specified by the user?
Thank you. https://code.ornl.gov/6da/ArborX/-/issues/291Unify sorting utilities2020-04-29T15:15:05ZArndt, DanielUnify sorting utilities*Created by: masterleinad*
Currently, we have sorting utilities both in `src/details/ArborX_DetailsSortUtils.hpp` and in `src/details/ArborX_DetailsBatchedQueries.hpp`. We should have a look if we can unify them.*Created by: masterleinad*
Currently, we have sorting utilities both in `src/details/ArborX_DetailsSortUtils.hpp` and in `src/details/ArborX_DetailsBatchedQueries.hpp`. We should have a look if we can unify them.https://code.ornl.gov/6da/ArborX/-/issues/168Cannot compile CUDA tests with boost version higher than 1.682020-04-28T03:58:56ZArndt, DanielCannot compile CUDA tests with boost version higher than 1.68*Created by: aprokop*
Tried 1.69, 1.70, 1.71. All fail with
```
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_TOOL_UNIV" is undefined
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_TOOL_U...*Created by: aprokop*
Tried 1.69, 1.70, 1.71. All fail with
```
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_TOOL_UNIV" is undefined
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_TOOL_UNIV_EX" is undefined
../test/tstSequenceContainers.cpp(25): error: identifier "CHECK" is undefined
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_INVOKE_IF_N_ARGS" is undefined
../test/tstSequenceContainers.cpp(89): error: identifier "BOOST_TEST_TOOL_UNIV" is undefined
../test/tstSequenceContainers.cpp(89): error: identifier "BOOST_TEST_TOOL_UNIV_EX" is undefined
../test/tstSequenceContainers.cpp(89): error: identifier "CHECK" is undefined
../test/tstSequenceContainers.cpp(89): error: identifier "BOOST_TEST_INVOKE_IF_N_ARGS" is undefined
```
1.68 works fine.https://code.ornl.gov/6da/ArborX/-/issues/161Friends-of-Friends Query2020-04-27T21:51:51ZArndt, DanielFriends-of-Friends Query*Created by: sslattery*
In many cosmology applications a Friends-of-Friends (FOF) query is used to identify clustering in point clouds. In general, the algorithm is as follows:
1. Build a tree from a set of input points
2. Establish...*Created by: sslattery*
In many cosmology applications a Friends-of-Friends (FOF) query is used to identify clustering in point clouds. In general, the algorithm is as follows:
1. Build a tree from a set of input points
2. Establish a fixed neighborhood radius `r`
3. For every point, locate the other points in the tree that are within distance `r`
4. For every neighboring point within distance `r`, find its neighboring points that are within distance `r` excluding any neighbors already found previously in the query
5. For each neighbor-of-neighbor repeat step 4 until no more points are found within distance `r`
The end result of each query should be a list of points that are within distance `r` of the query point, or are a neighbor-of-neighbors-of-neighbors-etc... of the query point.
Some questions:
1. It was mentioned that we could possibly cap the amount of recursion in the algorithm to a fixed depth of neighbors. Does this provide a benefit? If so what are reasonable values?
2. The output of the query could be in our standard structure in a CSR-like format where each query returns a set of object ids that satisfied the query predicate. However, many particles will belong to the same cluster and this cluster will be repeated for each point in it, thus potentially resulting in a large amount of memory needed for the query results depending on the structure of the cluster. What is the most useful output format of this type of query? Should we return clusters rather than results for individual points? Or return clusters as well as a list for each point of the cluster in which it is located?https://code.ornl.gov/6da/ArborX/-/issues/275Using ArborX when data does not fit on a GPU2020-04-20T21:20:51ZArndt, DanielUsing ArborX when data does not fit on a GPU*Created by: aprokop*
There are two scenarios here:
1) The primitives data fits, but results do not
2) The primitives data does not fit
This issue it not for immediate fix, just something to keep in mind, and let other applications...*Created by: aprokop*
There are two scenarios here:
1) The primitives data fits, but results do not
2) The primitives data does not fit
This issue it not for immediate fix, just something to keep in mind, and let other applications record their needs here.https://code.ornl.gov/6da/ArborX/-/issues/272nvcc warnings are not caught in testing2020-04-17T17:45:34ZArndt, Danielnvcc warnings are not caught in testing*Created by: aprokop*
Detected in #268. Warnings like
```
/var/jenkins/workspace/ArborX_PR-268/src/details/ArborX_Predicates.hpp(34): warning: __device__ annotation is ignored on a function("Nearest") that is explicitly defaulted on i...*Created by: aprokop*
Detected in #268. Warnings like
```
/var/jenkins/workspace/ArborX_PR-268/src/details/ArborX_Predicates.hpp(34): warning: __device__ annotation is ignored on a function("Nearest") that is explicitly defaulted on its first declaration
```
were not detected in Style. It seems that there is problematic interaction of nvcc_wrapper, nvcc, jenkins, etc.https://code.ornl.gov/6da/ArborX/-/issues/145Improve sort2020-04-16T13:34:20ZArndt, DanielImprove sort*Created by: aprokop*
Note that this is different from #60, as that one concerns only scaling.
Here are some results from TIOGA (with CudaUVM). Three variants:
- Upstream master
- Using `unsigned int` for `size_type` template param...*Created by: aprokop*
Note that this is different from #60, as that one concerns only scaling.
Here are some results from TIOGA (with CudaUVM). Three variants:
- Upstream master
- Using `unsigned int` for `size_type` template parameter in Kokkos' BinSort [[here](https://github.com/aprokop/ArborX/blob/dcc40adcc63f6bc253ec31f6050dd969f6e366c2/src/details/ArborX_DetailsSortUtils.hpp#L60)]
- Using Thrust [[here](https://github.com/aprokop/ArborX/blob/78f9a6f7b4d82b892e751a2ed9eefb0a101e3833/src/details/ArborX_DetailsSortUtils.hpp#L55)]
![mesh1_setup](https://user-images.githubusercontent.com/7297887/66701920-a1e47480-eccf-11e9-8187-0f9f1fe4124b.png)
![mesh1_search](https://user-images.githubusercontent.com/7297887/66701921-a6a92880-eccf-11e9-9603-f81a8e0da2df.png)
![mesh2_setup](https://user-images.githubusercontent.com/7297887/66701922-aad54600-eccf-11e9-8f86-c5621f8ba74f.png)
![mesh2_search](https://user-images.githubusercontent.com/7297887/66701924-af016380-eccf-11e9-829b-71d7847f21aa.png)
https://code.ornl.gov/6da/ArborX/-/issues/68Tested compilers in jenkins2020-03-31T18:34:26ZArndt, DanielTested compilers in jenkins*Created by: Rombur*
I have tried to find all the compilers we would want to use in Jenkins. We probably just want to test a subset of the list
Compiler | Serial | OpenMP | CUDA | OpenMP/CUDA
-- | -- | -- | -- | --
Clang-CUDA: clan...*Created by: Rombur*
I have tried to find all the compilers we would want to use in Jenkins. We probably just want to test a subset of the list
Compiler | Serial | OpenMP | CUDA | OpenMP/CUDA
-- | -- | -- | -- | --
Clang-CUDA: clang 7 + cuda 9.2 | | | X |
NVCC: 10.1 gcc 7.4 | | | | X
GCC: 5.4 (oldest compiler with C++14 support) | X | | |
GCC: 9.1 (latest compiler) | X | | |
Intel 2019 | | X | |
XL (need access to Power) | | | | X
PGI: 19.4 | | | | X
https://code.ornl.gov/6da/ArborX/-/issues/6Examine possible use of ArborX in Exawind/TIOGA2020-03-20T15:08:26ZArndt, DanielExamine possible use of ArborX in Exawind/TIOGA*Created by: aprokop*
Many of ExaWind project simulations operate on overset meshes. Such simulations perform search operation every time step to establish the connectivity between meshes. This may take a significant amount of time (acc...*Created by: aprokop*
Many of ExaWind project simulations operate on overset meshes. Such simulations perform search operation every time step to establish the connectivity between meshes. This may take a significant amount of time (according to Shreyas, it may take 20-30% of time in some simulations).
From what I understand, the search has three steps:
1. Posing the problem
What to search? The boundaries of one mesh. Determination is done
in pre-processing.
2. Doing coarse search.
3. Doing fine search.
Not exactly sure what's going on here, but it involves traversal of
local stencils.
The related code seems to be [here](https://github.com/Exawind/nalu-wind/blob/master/src/overset/TiogaSTKIface.C#L120) in Nalu, and [here](https://github.com/jsitaraman/tioga/blob/52199da1617dac0a0363fe83ca088c97f40494f1/src/search.C) in Tioga.
We should investigate the possibility of inserting ArborX into this interplay, especially given that Nalu is starting to get interested in running on GPUs.
https://code.ornl.gov/6da/ArborX/-/issues/165Use lower-precision data for bounding volumes2020-03-04T22:07:40ZArndt, DanielUse lower-precision data for bounding volumes*Created by: aprokop*
Some things to consider:
- Does the box size has to be aligned with word size?
- For correctness, the lower-precision AABB bounds must fully enclose the volume of the higher-precision AABB or object
The lower ...*Created by: aprokop*
Some things to consider:
- Does the box size has to be aligned with word size?
- For correctness, the lower-precision AABB bounds must fully enclose the volume of the higher-precision AABB or object
The lower bound of the AABB should be computed by rounding down to the nearest representable single-precision value. The upper bound should computed by rounding up.
There is also an issue that the range of values represented by `float` is smaller than that represented by `double`. Thus, scaling would be required.
- Floats may not be the final answer
For example, [this paper](https://arxiv.org/abs/1901.08088) considers quantized bounds. The scene bounding box is partitioned in $2^10$ bins in each direction, and the bounding boxes are snapped to bin boundaries. This allows to store each bound using only 10 bits, resulting in overall bounding volume of the node taking 64 bit (4 unused), i.e. 8 bytes, compared to 24 required by 4 floats. Together with 2 ints the node size is 16 bytes.https://code.ornl.gov/6da/ArborX/-/issues/231Examine interface and performance implications of having a query index2020-02-25T17:17:45ZArndt, DanielExamine interface and performance implications of having a query index*Created by: aprokop*
Currently, the only way to access the index of a query is to have a user attach it. In many situations, we know the index itself and do not need user info to process it. There are use cases where we need this index...*Created by: aprokop*
Currently, the only way to access the index of a query is to have a user attach it. In many situations, we know the index itself and do not need user info to process it. There are use cases where we need this index. Therefore, we need to see if it makes sense to always have it and treat it ourselves.https://code.ornl.gov/6da/ArborX/-/issues/201Capture Use Case: Distance to Wall2020-02-21T20:30:25ZArndt, DanielCapture Use Case: Distance to Wall*Created by: overfelt*
Just capturing the details of a use case at the request of ArborX developers, not a bug or feature request.
I have an application where the distance from points in the volume to the nearest wall is required f...*Created by: overfelt*
Just capturing the details of a use case at the request of ArborX developers, not a bug or feature request.
I have an application where the distance from points in the volume to the nearest wall is required for a turbulence model. One way we would like to calculate these values is to use ArborX to determine a few points on the surface that are closest to each point in the volume and then narrow the distance with a detailed calculation. This will be compared to a PDE based approach at scale and in parallel. In any case, the ArborX approach could be used for an initial guess for a PDE solve which would then be used for the small update to the distance given small mesh motions.
https://code.ornl.gov/6da/ArborX/-/issues/218Fix 1D and 2D domain scaling in the distributed benchmark2020-02-10T17:18:33ZArndt, DanielFix 1D and 2D domain scaling in the distributed benchmark*Created by: aprokop*
Currently, the domains are constructed as `[-a,a]^d`, where `d` is the partition dimension, and `a = std::cbrt(n_values)`. This way, the density of points stays constant in 3D. However, for 1D and 2D the density va...*Created by: aprokop*
Currently, the domains are constructed as `[-a,a]^d`, where `d` is the partition dimension, and `a = std::cbrt(n_values)`. This way, the density of points stays constant in 3D. However, for 1D and 2D the density varies, and the same radius would produce different average results. The value of `a` should be set to `n_values` in 1D, and `std::sqrt(n_values)` in 2D.https://code.ornl.gov/6da/ArborX/-/issues/210Question about ArborX terminology2020-01-30T12:50:46ZArndt, DanielQuestion about ArborX terminology*Created by: aprokop*
There is some ambiguity surrounding naming of things in ArborX. Specifically, the definitions of "query", "predicate" and "predicate with attachment".
My current understanding of those terms is:
* **predicate**...*Created by: aprokop*
There is some ambiguity surrounding naming of things in ArborX. Specifically, the definitions of "query", "predicate" and "predicate with attachment".
My current understanding of those terms is:
* **predicate** is a boolean function operating on a leaf node/user provided geometry
* **query** is a function that given a predicate(s) returns the results satisfying them.
If this language is correct, than we should not use `Query` as an object and things like callbacks should read
```diff
- template <typename Query, typename Insert>
- KOKKOS_FUNCTION void operator()(Query const &, int index,
- Insert const &insert) const;
+ template <typename Predicate, typename Insert>
+ KOKKOS_FUNCTION void operator()(Predicate const &, int index,
+ Insert const &insert) const;
```
On the other hand, the callback happens after predicate evaluated to true on that node, so what is exactly the object here?
There is also a question of whether "predicate with attachment" should be considered different concept from "predicate".https://code.ornl.gov/6da/ArborX/-/issues/50Set properties on unit tests2019-10-25T20:33:15ZArndt, DanielSet properties on unit tests*Created by: dalg24*
Slave machines used in automated testing get wildly oversubscribed, especially when allowing a number of concurrent builds >1
* [ ] Consider specifying how many processors a given unit test require
* [ ] Must act ...*Created by: dalg24*
Slave machines used in automated testing get wildly oversubscribed, especially when allowing a number of concurrent builds >1
* [ ] Consider specifying how many processors a given unit test require
* [ ] Must act on MPI unit tests that have OpenMPI enabled (set `OMP_NUM_THREADS` environment variable and specify the number of MPI processes instead of using `MPIEXEC_MAX_NUMPROCS`)https://code.ornl.gov/6da/ArborX/-/issues/64Add fsanitize testing2019-10-25T20:33:02ZArndt, DanielAdd fsanitize testing*Created by: aprokop*
Found some issues in #63. Plus we would want to satisfy an optional requirement of xSDK (cf #58).*Created by: aprokop*
Found some issues in #63. Plus we would want to satisfy an optional requirement of xSDK (cf #58).https://code.ornl.gov/6da/ArborX/-/issues/58xSDK policies compatibility2019-10-15T20:36:33ZArndt, DanielxSDK policies compatibility*Created by: aprokop*
Template taking from [here](https://github.com/xsdk-project/xsdk-policy-compatibility/blob/37b59a6c6eaa25e4484c58400677a32b3bea9740/template.md).
# xSDK Community Policy Compatibility for ArborX
**Website:** ...*Created by: aprokop*
Template taking from [here](https://github.com/xsdk-project/xsdk-policy-compatibility/blob/37b59a6c6eaa25e4484c58400677a32b3bea9740/template.md).
# xSDK Community Policy Compatibility for ArborX
**Website:** https://github.com/arborx/ArborX
### Mandatory Policies
| Policy |Support| Notes |
|------------------------|-------|-------------------------|
|**M1.** Support xSDK community GNU Autoconf or CMake options. |None| Short-expanation-here; optional link for more extensive details if needed, see below. [M1 details](#m1-details)|
|**M2.** Provide a comprehensive test suite for correctness of installation verification. |Full||
|**M3.** Employ user-provided MPI communicator (no MPI_COMM_WORLD). Don't assume a full MPI 3 implementation without checking. Provide an option to prevent any changes to MPI error-handling if it is changed by default. |Full||
|**M4.** Give best effort at portability to key architectures (standard Linux distributions, GNU, Clang, vendor compilers, and target machines at ALCF, NERSC, OLCF). |Full||
|**M5.** Provide a documented, reliable way to contact the development team. |Full| Have CONTRIBUTING.md |
|**M6.** Respect system resources and settings made by other previously called packages (e.g. signal handling). |Full||
|**M7.** Come with an open source (BSD style) license. |Full||
|**M8.** Provide a runtime API to return the current version number of the software. |Full||
|**M9.** Use a limited and well-defined symbol, macro, library, and include file name space. |Full||
|**M10.** Provide an xSDK team accessible repository (not necessarily publicly available). |Full||
|**M11.** Have no hardwired print or IO statements that cannot be turned off. |Full||
|**M12.** For external dependencies, allow installing, building, and linking against an outside copy of external software. |Full||
|**M13.** Install headers and libraries under \<prefix\>/include and \<prefix\>/lib. |Full||
|**M14.** Be buildable using 64 bit pointers. 32 bit is optional. |Full||
|**M15.** All xSDK compatibility changes should be sustainable. |Full||
|**M16.** The package must support production-quality installation compatible with the xSDK install tool and xSDK metapackage. |Partial|Spack package was merged, but is not part of xSDK metapackage yet.|
M1 details <a id="m1-details"></a>: optional: provide more details about approach to addressing topic M1.
M2 details <a id="m2-details"></a>: optional: provide more details about approach to addressing topic M2.
### Recommended Policies
| Policy |Support|Notes|
|-----------------------|-------|-|
|**R1.** Have a public repository. |Full||
|**R2.** Possible to run test suite under valgrind in order to test for memory corruption issues. |||
|**R3.** Adopt and document consistent system for error conditions/exceptions. |||
|**R4.** Free all system resources acquired as soon as they are no longer needed. ||Need Kokkos initialize/finalize |
|**R5.** Provide a mechanism to export ordered list of library dependencies. |||
|**R6.** Document versions of packages that it works with or depends upon, preferably in machine-readable form. |||
|**R7.** Have README, SUPPORT, LICENSE, and CHANGELOG files in top directory. |Partial| need SUPPORT. |
*M1 CMake policies*
| Item # | Option | Description | Notes |
|-|-|-|-|
|1| USE_XSDK_DEFAULTS=[YES,NO] | Implement the default behavior described below. | Each package can decide whether XSDK mode is the default mode. |
|2| CMAKE_INSTALL_PREFIX=directory | Identify location to install package. | Multiple “versions” of packages, such as debug and release, can be installed by using different prefix directories. |
|3 | CMAKE_CXX_COMPILER, ... | Select compilers and compiler flags |Variable `CPP` not supported by raw CMake |
|4| CMAKE_BUILD_TYPE=[Debug,Release] | Create libraries with debugging information and possible additional error checking | Default in XSDK mode: Debug
|5| BUILD_SHARED_LIBS=[YES,NO] | Select option used for indicating whether to build shared libraries | Default in XSDK mode: shared |
|6| XSDK_ENABLE_<language>=[YES,NO] | Build interface for a particular additional language. | |
|7| XSDK_PRECISION=[SINGLE,DOUBLE,QUAD] | Determine precision for packages that build only for one precision | Default in XSDK mode: double. Packages that handle all precisions automatically are free to ignore this option. |
|8| XSDK_INDEX_SIZE=[32,64] | Determine index size for packages that build only for one index size | Default in XSDK mode: 32. Packages that handle all precisions automatically are free to ignore this option. |
|9| TPL_BLAS_LIBRARIES=”linkable list of libraries”; TPL_LAPACK_LIBRARIES=”linkable list of libraries” (should not use -L or -l flags in the lists) | Set location of BLAS and LAPACK libraries | Default is to locate one on the system automatically) |
|10| TPL_ENABLE_<package>=[YES,NO], TPL_<package>_LIBRARIES=”linkable list of libraries” (should not use -L or -l flags), TPL_<package>_INCLUDE_DIRS=”/path/to/includes1;/path/to/includes1;...” (Cannot include -I flags) | Determine other package libraries and include directories. |
|11| | In the XSDK mode, XSDK projects should not rely on users providing any library path information in environmental variables such as LD_LIBRARY_PATH.|
|12| | After packages are configured, they can be compiled, installed and “smoke” tested with thefollowing commands: make ; [sudo] make install ; make test_install. ||
|13|| After an install the package should provide a machine-readable output to show provenance, that is, what compilers were used and what libraries were linked with, as well as other build configuration information if this information was also stored in the install directory somewhere, so that users with problems can send the information directly to developers.||
https://code.ornl.gov/6da/ArborX/-/issues/77Check roofline model for ArborX2019-09-15T12:31:10ZArndt, DanielCheck roofline model for ArborX*Created by: aprokop*
Right now, it's unclear where we are at.*Created by: aprokop*
Right now, it's unclear where we are at.https://code.ornl.gov/6da/ArborX/-/issues/60Sorting Morton indices does not scale for small problem sizes2019-05-17T16:35:18ZArndt, DanielSorting Morton indices does not scale for small problem sizes*Created by: aprokop*
The default `bvh_driver` parameters, OpenMP run.
`OMP_NUM_THREADS=1`
...*Created by: aprokop*
The default `bvh_driver` parameters, OpenMP run.
`OMP_NUM_THREADS=1`
```
2.99e-01 sec 10.0% 98.6% 0.0% 96 ArborX:BVH:sort_morton_codes_and_init_leaves [region]
|-> 3.82e-02 sec 1.3% 100.0% 0.0% 96 Kokkos::Sort::BinCount [for]
|-> 1.20e-01 sec 4.0% 100.0% 0.0% 96 Kokkos::Sort::BinBinning [for]
|-> 5.39e-02 sec 1.8% 100.0% 0.0% 96 Kokkos::Sort::BinSort [for]
```
`OMP_NUM_THREADS=2`
```
4.26e-01 sec 14.2% 98.9% 0.0% 93 ArborX:BVH:sort_morton_codes_and_init_leaves [region]
|-> 1.13e-01 sec 3.8% 100.0% 0.0% 93 Kokkos::Sort::BinCount [for]
|-> 2.18e-01 sec 7.3% 100.0% 0.0% 93 Kokkos::Sort::BinBinning [for]
|-> 3.84e-02 sec 1.3% 100.0% 0.0% 93 Kokkos::Sort::BinSort [for]
```
Note: the number of calls slightly different (96 vs 93).
`BinCount` is an order of magnitude slower, `BinBinning` twice as slow.