ArborX issueshttps://code.ornl.gov/6da/ArborX/-/issues2020-03-20T15:08:26Zhttps://code.ornl.gov/6da/ArborX/-/issues/6Examine possible use of ArborX in Exawind/TIOGA2020-03-20T15:08:26ZArndt, DanielExamine possible use of ArborX in Exawind/TIOGA*Created by: aprokop*
Many of ExaWind project simulations operate on overset meshes. Such simulations perform search operation every time step to establish the connectivity between meshes. This may take a significant amount of time (acc...*Created by: aprokop*
Many of ExaWind project simulations operate on overset meshes. Such simulations perform search operation every time step to establish the connectivity between meshes. This may take a significant amount of time (according to Shreyas, it may take 20-30% of time in some simulations).
From what I understand, the search has three steps:
1. Posing the problem
What to search? The boundaries of one mesh. Determination is done
in pre-processing.
2. Doing coarse search.
3. Doing fine search.
Not exactly sure what's going on here, but it involves traversal of
local stencils.
The related code seems to be [here](https://github.com/Exawind/nalu-wind/blob/master/src/overset/TiogaSTKIface.C#L120) in Nalu, and [here](https://github.com/jsitaraman/tioga/blob/52199da1617dac0a0363fe83ca088c97f40494f1/src/search.C) in Tioga.
We should investigate the possibility of inserting ArborX into this interplay, especially given that Nalu is starting to get interested in running on GPUs.
https://code.ornl.gov/6da/ArborX/-/issues/9Tree visualization2019-05-02T14:13:51ZArndt, DanielTree visualization*Created by: aprokop*
Per @dalg24:
Related to ornl-cees/DataTransferKit#535 and ornl-cees/datatransferkit#538
Writing down here some of the possible improvements so we can share and I don't forget.
* Add composite visitor to be ...*Created by: aprokop*
Per @dalg24:
Related to ornl-cees/DataTransferKit#535 and ornl-cees/datatransferkit#538
Writing down here some of the possible improvements so we can share and I don't forget.
* Add composite visitor to be able to apply multiple visitors within a single traversal. Two options here, inheritance (yuk but easy) or type erasure (harder but fun)
* Implement a visitor that prints out directly the matrices that show what node (col) have been visited for each query (row)
* Put more scripts under version control. More particularly I have in mind the ones that produce Graphviz images of the tree structure and movies that show successive traversals.
* Consider running the scripts in the CI. I am not 100% convinced it's a good idea but just in case [here](https://github.com/ORNL-CEES/DataTransferKit/pull/538#issuecomment-475638131) are useful instructions to install missing requirements in our dev/ci container.https://code.ornl.gov/6da/ArborX/-/issues/26Decide on the minimum chunk of work for OpenMP2019-05-02T14:13:06ZArndt, DanielDecide on the minimum chunk of work for OpenMP*Created by: aprokop*
Right now, strong scaling of OpenMP run will start increasing the OpenMP time for smaller problems, once the ratio of objects/thread hits certain threshold. We need to figure out the minimal amount of work per thre...*Created by: aprokop*
Right now, strong scaling of OpenMP run will start increasing the OpenMP time for smaller problems, once the ratio of objects/thread hits certain threshold. We need to figure out the minimal amount of work per thread so that they won't be penalized.
**Summit 21 threads/smt1 construction filled box**
![summit_construction_filled_box_21_smt_1](https://user-images.githubusercontent.com/7297887/57081247-578fae00-6cc2-11e9-8f7e-3997712903cb.png)
**Summit 168 threads/smt4 construction filled box**
![summit_construction_filled_box_168_smt_4](https://user-images.githubusercontent.com/7297887/57081248-578fae00-6cc2-11e9-8d69-ef3c96734ff4.png)
https://code.ornl.gov/6da/ArborX/-/issues/50Set properties on unit tests2019-10-25T20:33:15ZArndt, DanielSet properties on unit tests*Created by: dalg24*
Slave machines used in automated testing get wildly oversubscribed, especially when allowing a number of concurrent builds >1
* [ ] Consider specifying how many processors a given unit test require
* [ ] Must act ...*Created by: dalg24*
Slave machines used in automated testing get wildly oversubscribed, especially when allowing a number of concurrent builds >1
* [ ] Consider specifying how many processors a given unit test require
* [ ] Must act on MPI unit tests that have OpenMPI enabled (set `OMP_NUM_THREADS` environment variable and specify the number of MPI processes instead of using `MPIEXEC_MAX_NUMPROCS`)https://code.ornl.gov/6da/ArborX/-/issues/58xSDK policies compatibility2019-10-15T20:36:33ZArndt, DanielxSDK policies compatibility*Created by: aprokop*
Template taking from [here](https://github.com/xsdk-project/xsdk-policy-compatibility/blob/37b59a6c6eaa25e4484c58400677a32b3bea9740/template.md).
# xSDK Community Policy Compatibility for ArborX
**Website:** ...*Created by: aprokop*
Template taking from [here](https://github.com/xsdk-project/xsdk-policy-compatibility/blob/37b59a6c6eaa25e4484c58400677a32b3bea9740/template.md).
# xSDK Community Policy Compatibility for ArborX
**Website:** https://github.com/arborx/ArborX
### Mandatory Policies
| Policy |Support| Notes |
|------------------------|-------|-------------------------|
|**M1.** Support xSDK community GNU Autoconf or CMake options. |None| Short-expanation-here; optional link for more extensive details if needed, see below. [M1 details](#m1-details)|
|**M2.** Provide a comprehensive test suite for correctness of installation verification. |Full||
|**M3.** Employ user-provided MPI communicator (no MPI_COMM_WORLD). Don't assume a full MPI 3 implementation without checking. Provide an option to prevent any changes to MPI error-handling if it is changed by default. |Full||
|**M4.** Give best effort at portability to key architectures (standard Linux distributions, GNU, Clang, vendor compilers, and target machines at ALCF, NERSC, OLCF). |Full||
|**M5.** Provide a documented, reliable way to contact the development team. |Full| Have CONTRIBUTING.md |
|**M6.** Respect system resources and settings made by other previously called packages (e.g. signal handling). |Full||
|**M7.** Come with an open source (BSD style) license. |Full||
|**M8.** Provide a runtime API to return the current version number of the software. |Full||
|**M9.** Use a limited and well-defined symbol, macro, library, and include file name space. |Full||
|**M10.** Provide an xSDK team accessible repository (not necessarily publicly available). |Full||
|**M11.** Have no hardwired print or IO statements that cannot be turned off. |Full||
|**M12.** For external dependencies, allow installing, building, and linking against an outside copy of external software. |Full||
|**M13.** Install headers and libraries under \<prefix\>/include and \<prefix\>/lib. |Full||
|**M14.** Be buildable using 64 bit pointers. 32 bit is optional. |Full||
|**M15.** All xSDK compatibility changes should be sustainable. |Full||
|**M16.** The package must support production-quality installation compatible with the xSDK install tool and xSDK metapackage. |Partial|Spack package was merged, but is not part of xSDK metapackage yet.|
M1 details <a id="m1-details"></a>: optional: provide more details about approach to addressing topic M1.
M2 details <a id="m2-details"></a>: optional: provide more details about approach to addressing topic M2.
### Recommended Policies
| Policy |Support|Notes|
|-----------------------|-------|-|
|**R1.** Have a public repository. |Full||
|**R2.** Possible to run test suite under valgrind in order to test for memory corruption issues. |||
|**R3.** Adopt and document consistent system for error conditions/exceptions. |||
|**R4.** Free all system resources acquired as soon as they are no longer needed. ||Need Kokkos initialize/finalize |
|**R5.** Provide a mechanism to export ordered list of library dependencies. |||
|**R6.** Document versions of packages that it works with or depends upon, preferably in machine-readable form. |||
|**R7.** Have README, SUPPORT, LICENSE, and CHANGELOG files in top directory. |Partial| need SUPPORT. |
*M1 CMake policies*
| Item # | Option | Description | Notes |
|-|-|-|-|
|1| USE_XSDK_DEFAULTS=[YES,NO] | Implement the default behavior described below. | Each package can decide whether XSDK mode is the default mode. |
|2| CMAKE_INSTALL_PREFIX=directory | Identify location to install package. | Multiple “versions” of packages, such as debug and release, can be installed by using different prefix directories. |
|3 | CMAKE_CXX_COMPILER, ... | Select compilers and compiler flags |Variable `CPP` not supported by raw CMake |
|4| CMAKE_BUILD_TYPE=[Debug,Release] | Create libraries with debugging information and possible additional error checking | Default in XSDK mode: Debug
|5| BUILD_SHARED_LIBS=[YES,NO] | Select option used for indicating whether to build shared libraries | Default in XSDK mode: shared |
|6| XSDK_ENABLE_<language>=[YES,NO] | Build interface for a particular additional language. | |
|7| XSDK_PRECISION=[SINGLE,DOUBLE,QUAD] | Determine precision for packages that build only for one precision | Default in XSDK mode: double. Packages that handle all precisions automatically are free to ignore this option. |
|8| XSDK_INDEX_SIZE=[32,64] | Determine index size for packages that build only for one index size | Default in XSDK mode: 32. Packages that handle all precisions automatically are free to ignore this option. |
|9| TPL_BLAS_LIBRARIES=”linkable list of libraries”; TPL_LAPACK_LIBRARIES=”linkable list of libraries” (should not use -L or -l flags in the lists) | Set location of BLAS and LAPACK libraries | Default is to locate one on the system automatically) |
|10| TPL_ENABLE_<package>=[YES,NO], TPL_<package>_LIBRARIES=”linkable list of libraries” (should not use -L or -l flags), TPL_<package>_INCLUDE_DIRS=”/path/to/includes1;/path/to/includes1;...” (Cannot include -I flags) | Determine other package libraries and include directories. |
|11| | In the XSDK mode, XSDK projects should not rely on users providing any library path information in environmental variables such as LD_LIBRARY_PATH.|
|12| | After packages are configured, they can be compiled, installed and “smoke” tested with thefollowing commands: make ; [sudo] make install ; make test_install. ||
|13|| After an install the package should provide a machine-readable output to show provenance, that is, what compilers were used and what libraries were linked with, as well as other build configuration information if this information was also stored in the install directory somewhere, so that users with problems can send the information directly to developers.||
https://code.ornl.gov/6da/ArborX/-/issues/60Sorting Morton indices does not scale for small problem sizes2019-05-17T16:35:18ZArndt, DanielSorting Morton indices does not scale for small problem sizes*Created by: aprokop*
The default `bvh_driver` parameters, OpenMP run.
`OMP_NUM_THREADS=1`
...*Created by: aprokop*
The default `bvh_driver` parameters, OpenMP run.
`OMP_NUM_THREADS=1`
```
2.99e-01 sec 10.0% 98.6% 0.0% 96 ArborX:BVH:sort_morton_codes_and_init_leaves [region]
|-> 3.82e-02 sec 1.3% 100.0% 0.0% 96 Kokkos::Sort::BinCount [for]
|-> 1.20e-01 sec 4.0% 100.0% 0.0% 96 Kokkos::Sort::BinBinning [for]
|-> 5.39e-02 sec 1.8% 100.0% 0.0% 96 Kokkos::Sort::BinSort [for]
```
`OMP_NUM_THREADS=2`
```
4.26e-01 sec 14.2% 98.9% 0.0% 93 ArborX:BVH:sort_morton_codes_and_init_leaves [region]
|-> 1.13e-01 sec 3.8% 100.0% 0.0% 93 Kokkos::Sort::BinCount [for]
|-> 2.18e-01 sec 7.3% 100.0% 0.0% 93 Kokkos::Sort::BinBinning [for]
|-> 3.84e-02 sec 1.3% 100.0% 0.0% 93 Kokkos::Sort::BinSort [for]
```
Note: the number of calls slightly different (96 vs 93).
`BinCount` is an order of magnitude slower, `BinBinning` twice as slow.https://code.ornl.gov/6da/ArborX/-/issues/64Add fsanitize testing2019-10-25T20:33:02ZArndt, DanielAdd fsanitize testing*Created by: aprokop*
Found some issues in #63. Plus we would want to satisfy an optional requirement of xSDK (cf #58).*Created by: aprokop*
Found some issues in #63. Plus we would want to satisfy an optional requirement of xSDK (cf #58).https://code.ornl.gov/6da/ArborX/-/issues/68Tested compilers in jenkins2020-03-31T18:34:26ZArndt, DanielTested compilers in jenkins*Created by: Rombur*
I have tried to find all the compilers we would want to use in Jenkins. We probably just want to test a subset of the list
Compiler | Serial | OpenMP | CUDA | OpenMP/CUDA
-- | -- | -- | -- | --
Clang-CUDA: clan...*Created by: Rombur*
I have tried to find all the compilers we would want to use in Jenkins. We probably just want to test a subset of the list
Compiler | Serial | OpenMP | CUDA | OpenMP/CUDA
-- | -- | -- | -- | --
Clang-CUDA: clang 7 + cuda 9.2 | | | X |
NVCC: 10.1 gcc 7.4 | | | | X
GCC: 5.4 (oldest compiler with C++14 support) | X | | |
GCC: 9.1 (latest compiler) | X | | |
Intel 2019 | | X | |
XL (need access to Power) | | | | X
PGI: 19.4 | | | | X
https://code.ornl.gov/6da/ArborX/-/issues/77Check roofline model for ArborX2019-09-15T12:31:10ZArndt, DanielCheck roofline model for ArborX*Created by: aprokop*
Right now, it's unclear where we are at.*Created by: aprokop*
Right now, it's unclear where we are at.https://code.ornl.gov/6da/ArborX/-/issues/145Improve sort2020-04-16T13:34:20ZArndt, DanielImprove sort*Created by: aprokop*
Note that this is different from #60, as that one concerns only scaling.
Here are some results from TIOGA (with CudaUVM). Three variants:
- Upstream master
- Using `unsigned int` for `size_type` template param...*Created by: aprokop*
Note that this is different from #60, as that one concerns only scaling.
Here are some results from TIOGA (with CudaUVM). Three variants:
- Upstream master
- Using `unsigned int` for `size_type` template parameter in Kokkos' BinSort [[here](https://github.com/aprokop/ArborX/blob/dcc40adcc63f6bc253ec31f6050dd969f6e366c2/src/details/ArborX_DetailsSortUtils.hpp#L60)]
- Using Thrust [[here](https://github.com/aprokop/ArborX/blob/78f9a6f7b4d82b892e751a2ed9eefb0a101e3833/src/details/ArborX_DetailsSortUtils.hpp#L55)]
![mesh1_setup](https://user-images.githubusercontent.com/7297887/66701920-a1e47480-eccf-11e9-8187-0f9f1fe4124b.png)
![mesh1_search](https://user-images.githubusercontent.com/7297887/66701921-a6a92880-eccf-11e9-9603-f81a8e0da2df.png)
![mesh2_setup](https://user-images.githubusercontent.com/7297887/66701922-aad54600-eccf-11e9-8f86-c5621f8ba74f.png)
![mesh2_search](https://user-images.githubusercontent.com/7297887/66701924-af016380-eccf-11e9-829b-71d7847f21aa.png)
https://code.ornl.gov/6da/ArborX/-/issues/161Friends-of-Friends Query2020-04-27T21:51:51ZArndt, DanielFriends-of-Friends Query*Created by: sslattery*
In many cosmology applications a Friends-of-Friends (FOF) query is used to identify clustering in point clouds. In general, the algorithm is as follows:
1. Build a tree from a set of input points
2. Establish...*Created by: sslattery*
In many cosmology applications a Friends-of-Friends (FOF) query is used to identify clustering in point clouds. In general, the algorithm is as follows:
1. Build a tree from a set of input points
2. Establish a fixed neighborhood radius `r`
3. For every point, locate the other points in the tree that are within distance `r`
4. For every neighboring point within distance `r`, find its neighboring points that are within distance `r` excluding any neighbors already found previously in the query
5. For each neighbor-of-neighbor repeat step 4 until no more points are found within distance `r`
The end result of each query should be a list of points that are within distance `r` of the query point, or are a neighbor-of-neighbors-of-neighbors-etc... of the query point.
Some questions:
1. It was mentioned that we could possibly cap the amount of recursion in the algorithm to a fixed depth of neighbors. Does this provide a benefit? If so what are reasonable values?
2. The output of the query could be in our standard structure in a CSR-like format where each query returns a set of object ids that satisfied the query predicate. However, many particles will belong to the same cluster and this cluster will be repeated for each point in it, thus potentially resulting in a large amount of memory needed for the query results depending on the structure of the cluster. What is the most useful output format of this type of query? Should we return clusters rather than results for individual points? Or return clusters as well as a list for each point of the cluster in which it is located?https://code.ornl.gov/6da/ArborX/-/issues/165Use lower-precision data for bounding volumes2020-03-04T22:07:40ZArndt, DanielUse lower-precision data for bounding volumes*Created by: aprokop*
Some things to consider:
- Does the box size has to be aligned with word size?
- For correctness, the lower-precision AABB bounds must fully enclose the volume of the higher-precision AABB or object
The lower ...*Created by: aprokop*
Some things to consider:
- Does the box size has to be aligned with word size?
- For correctness, the lower-precision AABB bounds must fully enclose the volume of the higher-precision AABB or object
The lower bound of the AABB should be computed by rounding down to the nearest representable single-precision value. The upper bound should computed by rounding up.
There is also an issue that the range of values represented by `float` is smaller than that represented by `double`. Thus, scaling would be required.
- Floats may not be the final answer
For example, [this paper](https://arxiv.org/abs/1901.08088) considers quantized bounds. The scene bounding box is partitioned in $2^10$ bins in each direction, and the bounding boxes are snapped to bin boundaries. This allows to store each bound using only 10 bits, resulting in overall bounding volume of the node taking 64 bit (4 unused), i.e. 8 bytes, compared to 24 required by 4 floats. Together with 2 ints the node size is 16 bytes.https://code.ornl.gov/6da/ArborX/-/issues/168Cannot compile CUDA tests with boost version higher than 1.682020-04-28T03:58:56ZArndt, DanielCannot compile CUDA tests with boost version higher than 1.68*Created by: aprokop*
Tried 1.69, 1.70, 1.71. All fail with
```
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_TOOL_UNIV" is undefined
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_TOOL_U...*Created by: aprokop*
Tried 1.69, 1.70, 1.71. All fail with
```
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_TOOL_UNIV" is undefined
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_TOOL_UNIV_EX" is undefined
../test/tstSequenceContainers.cpp(25): error: identifier "CHECK" is undefined
../test/tstSequenceContainers.cpp(25): error: identifier "BOOST_TEST_INVOKE_IF_N_ARGS" is undefined
../test/tstSequenceContainers.cpp(89): error: identifier "BOOST_TEST_TOOL_UNIV" is undefined
../test/tstSequenceContainers.cpp(89): error: identifier "BOOST_TEST_TOOL_UNIV_EX" is undefined
../test/tstSequenceContainers.cpp(89): error: identifier "CHECK" is undefined
../test/tstSequenceContainers.cpp(89): error: identifier "BOOST_TEST_INVOKE_IF_N_ARGS" is undefined
```
1.68 works fine.https://code.ornl.gov/6da/ArborX/-/issues/201Capture Use Case: Distance to Wall2020-02-21T20:30:25ZArndt, DanielCapture Use Case: Distance to Wall*Created by: overfelt*
Just capturing the details of a use case at the request of ArborX developers, not a bug or feature request.
I have an application where the distance from points in the volume to the nearest wall is required f...*Created by: overfelt*
Just capturing the details of a use case at the request of ArborX developers, not a bug or feature request.
I have an application where the distance from points in the volume to the nearest wall is required for a turbulence model. One way we would like to calculate these values is to use ArborX to determine a few points on the surface that are closest to each point in the volume and then narrow the distance with a detailed calculation. This will be compared to a PDE based approach at scale and in parallel. In any case, the ArborX approach could be used for an initial guess for a PDE solve which would then be used for the small update to the distance given small mesh motions.
https://code.ornl.gov/6da/ArborX/-/issues/210Question about ArborX terminology2020-01-30T12:50:46ZArndt, DanielQuestion about ArborX terminology*Created by: aprokop*
There is some ambiguity surrounding naming of things in ArborX. Specifically, the definitions of "query", "predicate" and "predicate with attachment".
My current understanding of those terms is:
* **predicate**...*Created by: aprokop*
There is some ambiguity surrounding naming of things in ArborX. Specifically, the definitions of "query", "predicate" and "predicate with attachment".
My current understanding of those terms is:
* **predicate** is a boolean function operating on a leaf node/user provided geometry
* **query** is a function that given a predicate(s) returns the results satisfying them.
If this language is correct, than we should not use `Query` as an object and things like callbacks should read
```diff
- template <typename Query, typename Insert>
- KOKKOS_FUNCTION void operator()(Query const &, int index,
- Insert const &insert) const;
+ template <typename Predicate, typename Insert>
+ KOKKOS_FUNCTION void operator()(Predicate const &, int index,
+ Insert const &insert) const;
```
On the other hand, the callback happens after predicate evaluated to true on that node, so what is exactly the object here?
There is also a question of whether "predicate with attachment" should be considered different concept from "predicate".https://code.ornl.gov/6da/ArborX/-/issues/218Fix 1D and 2D domain scaling in the distributed benchmark2020-02-10T17:18:33ZArndt, DanielFix 1D and 2D domain scaling in the distributed benchmark*Created by: aprokop*
Currently, the domains are constructed as `[-a,a]^d`, where `d` is the partition dimension, and `a = std::cbrt(n_values)`. This way, the density of points stays constant in 3D. However, for 1D and 2D the density va...*Created by: aprokop*
Currently, the domains are constructed as `[-a,a]^d`, where `d` is the partition dimension, and `a = std::cbrt(n_values)`. This way, the density of points stays constant in 3D. However, for 1D and 2D the density varies, and the same radius would produce different average results. The value of `a` should be set to `n_values` in 1D, and `std::sqrt(n_values)` in 2D.https://code.ornl.gov/6da/ArborX/-/issues/231Examine interface and performance implications of having a query index2020-02-25T17:17:45ZArndt, DanielExamine interface and performance implications of having a query index*Created by: aprokop*
Currently, the only way to access the index of a query is to have a user attach it. In many situations, we know the index itself and do not need user info to process it. There are use cases where we need this index...*Created by: aprokop*
Currently, the only way to access the index of a query is to have a user attach it. In many situations, we know the index itself and do not need user info to process it. There are use cases where we need this index. Therefore, we need to see if it makes sense to always have it and treat it ourselves.https://code.ornl.gov/6da/ArborX/-/issues/272nvcc warnings are not caught in testing2020-04-17T17:45:34ZArndt, Danielnvcc warnings are not caught in testing*Created by: aprokop*
Detected in #268. Warnings like
```
/var/jenkins/workspace/ArborX_PR-268/src/details/ArborX_Predicates.hpp(34): warning: __device__ annotation is ignored on a function("Nearest") that is explicitly defaulted on i...*Created by: aprokop*
Detected in #268. Warnings like
```
/var/jenkins/workspace/ArborX_PR-268/src/details/ArborX_Predicates.hpp(34): warning: __device__ annotation is ignored on a function("Nearest") that is explicitly defaulted on its first declaration
```
were not detected in Style. It seems that there is problematic interaction of nvcc_wrapper, nvcc, jenkins, etc.https://code.ornl.gov/6da/ArborX/-/issues/275Using ArborX when data does not fit on a GPU2020-04-20T21:20:51ZArndt, DanielUsing ArborX when data does not fit on a GPU*Created by: aprokop*
There are two scenarios here:
1) The primitives data fits, but results do not
2) The primitives data does not fit
This issue it not for immediate fix, just something to keep in mind, and let other applications...*Created by: aprokop*
There are two scenarios here:
1) The primitives data fits, but results do not
2) The primitives data does not fit
This issue it not for immediate fix, just something to keep in mind, and let other applications record their needs here.https://code.ornl.gov/6da/ArborX/-/issues/291Unify sorting utilities2020-04-29T15:15:05ZArndt, DanielUnify sorting utilities*Created by: masterleinad*
Currently, we have sorting utilities both in `src/details/ArborX_DetailsSortUtils.hpp` and in `src/details/ArborX_DetailsBatchedQueries.hpp`. We should have a look if we can unify them.*Created by: masterleinad*
Currently, we have sorting utilities both in `src/details/ArborX_DetailsSortUtils.hpp` and in `src/details/ArborX_DetailsBatchedQueries.hpp`. We should have a look if we can unify them.