Friends-of-Friends Query
Created by: sslattery
In many cosmology applications a Friends-of-Friends (FOF) query is used to identify clustering in point clouds. In general, the algorithm is as follows:
- Build a tree from a set of input points
- Establish a fixed neighborhood radius
r
- For every point, locate the other points in the tree that are within distance
r
- For every neighboring point within distance
r
, find its neighboring points that are within distancer
excluding any neighbors already found previously in the query - For each neighbor-of-neighbor repeat step 4 until no more points are found within distance
r
The end result of each query should be a list of points that are within distance r
of the query point, or are a neighbor-of-neighbors-of-neighbors-etc... of the query point.
Some questions:
- It was mentioned that we could possibly cap the amount of recursion in the algorithm to a fixed depth of neighbors. Does this provide a benefit? If so what are reasonable values?
- The output of the query could be in our standard structure in a CSR-like format where each query returns a set of object ids that satisfied the query predicate. However, many particles will belong to the same cluster and this cluster will be repeated for each point in it, thus potentially resulting in a large amount of memory needed for the query results depending on the structure of the cluster. What is the most useful output format of this type of query? Should we return clusters rather than results for individual points? Or return clusters as well as a list for each point of the cluster in which it is located?