Loading graph_c_binding/graph_c_binding.h +2 −2 Original line number Diff line number Diff line Loading @@ -9,7 +9,7 @@ /// /// @section graph_c_binding_into Introduction /// This section assumes the reader is already familar with developing C codes. /// The simplist method to link framework code into a C code is to create a c++ /// The simplist method to link framework code into a C code is to create a C++ /// function with @code extern "C" @endcode First create a header file /// <tt><i>c_callable</i>.h</tt> /// @code Loading @@ -19,7 +19,7 @@ /// @endcode /// /// Next create a source file <tt><i>c_callable</i>.c</tt> and add the /// framework. This example uses the line /// framework. This example uses the equation of a line example from the /// @ref tutorial_workflow "making workflows" turorial. /// @code /// // Include the necessary framework headers. Loading graph_docs/compiling.dox +40 −37 Original line number Diff line number Diff line Loading @@ -9,15 +9,18 @@ * * <hr> * @section build_system_user User Guide * The following section is for users of framework. * The following section is for users of the framework. * * @subsection build_system_user_dependencies Dependencies * The graph_framwork requires three requires external dependencies and one * optional dependency. <a href="https://llvm.org">LLVM</a> is another * dependency that is used for generating CPU code. However this is * automatically obtained via the build system. The graph_frame is written using * the C++20 standard. The C interface using C17 and the fortran interface using * Fortran 2008. * The graph_framwork requires three external dependencies and one optional * dependency. <a href="https://llvm.org">LLVM</a> is another dependency that is * used for generating CPU code. However this is automatically obtained via the * build system. The graph_frame is written using the * <a href="https://www.cppreference.com/w/cpp/20.html">C++20</a> standard. The * C interface uses * <a href="https://www.cppreference.com/w/cpp/compiler_support/17.html">C17</a> * and the fortran interface uses * <a href="https://fortranwiki.org/fortran/show/Fortran+2008">Fortran 2008</a>. * * @subsubsection build_system_user_dependencies_required Required * * <a href="http://www.cmake.org">cmake</a> version greater than 3.21. Loading @@ -28,7 +31,7 @@ * * <a href="https://www.doxygen.nl/index.html">Doxygen</a> for generating this documentation. * * @subsection build_system_clone Obtaining the code * The framework code itself be obtained from the * The framework code itself can be obtained from the * <a href="https://github.com/ORNL-Fusion/graph_framework">graph_framework</a> * Github repository. * @code Loading @@ -53,7 +56,7 @@ * Where <tt>../</tt> points to the source directory containing the top * level <tt>CMakeLists.txt</tt> file. * * The recommended method is to use the interatice <tt>ccmake</tt> command * The recommended method is to use the interactive <tt>ccmake</tt> command * instead. * @code ccmake ../ Loading @@ -62,10 +65,10 @@ * <tt>-D</tt> option. * * @subsubsection build_system_user_options Build system Options * Initally, there will be no options. Along the botton, there are several * command. Use the 'c' command to start the configuation process. Once * Initially, there will be no options. Along the botton, there are several * commands. Use the 'c' command to start the configuation process. Once * configured several options will apear. During this process cmake is cloning * the LLVM repository. So this step may take some time initally. Mode of the * the LLVM repository. So this step may take some time initally. Most of the * are various options for configuing LLVM and can be ignored. The important * options are listed below. * Loading @@ -78,27 +81,27 @@ * * <tt>MinSizeRel</tt> * * <tt>RelWithDebInfo</tt> * <tr><td><tt>USE_VERBOSE</tt> <td>Show verbose information about compute kernels. * <tr><td><tt>BUILD_C_BINDING</tt> <td>Generate the C langauge interface. * <tr><td><tt>BUILD_Fortran_BINDING</tt><td>Generate the Fortran language interface. * <tr><td><tt>BUILD_C_BINDING</tt> <td>Generate the @ref graph_c_binding.h "C langauge interface". * <tr><td><tt>BUILD_Fortran_BINDING</tt><td>Generate the @ref graph_fortran "Fortran language interface". * <tr><td><tt>USE_METAL</tt> <td>Enable the <a href="https://developer.apple.com/metal/">Metal</a> backend (macOS only). * <tr><td><tt>USE_CUDA</tt> <td>Enable the <a href="https://developer.nvidia.com/cuda-zone">Cuda</a> backend (Linux only). * <tr><td><tt>USE_HIP</tt> <td>Enable the <a href="https://www.amd.com/en/products/software/rocm.html">Hip</a> backend (Linux only, Hip branch). * <tr><td><tt>USE_SSH</tt> <td>Use ssh for git instead of html. * </table> * * @note macOS uses will need to change the default option for * @note macOS users will need to change the default option for * <tt>CMAKE_CXX_COMPILER</tt> to <tt>clang++</tt>. This is due to the way the * build systems determines default include directories for system libraries. * This can be accomplished using the advacned options using the <tt>t</tt> * This can be accomplished using the advacned options accessed from the <tt>t</tt> * command or setting this via the command line. * @code cmake -DCMAKE_CXX_COMPILER=clang++ ../ @endcode * * Every time an option is changed, or a new option is available, you need to * use the configure <tt>c</tt> command for changes to take affect. Once all * options are set, the a generate <tt>g</tt> options will appear. Using this * option will build a make file. * Any time an option is changed, or a new option becomes is available, you need * to use the configure <tt>c</tt> command for changes to take affect. Once all * options are set, a generate <tt>g</tt> options will appear. Using this option * will generate the Makefile. * * @subsubsection build_system_trouble_shooting Trouble Shooting. * Some times, cmake will fail to locate the NetCDF library if it is not Loading @@ -113,28 +116,28 @@ * @code make @endcode * command. Note that due build system first starts by pulling the latest * of LLVM. The build system then has to build LLVM first which can take a * while. It is recommended to use a limited parallel build. * command. Note that the build system first starts by pulling the latest * revision of LLVM. The build system then has to build LLVM first which can * take a while. It is recommended to use a limited parallel build. * @code make -j10 @endcode * The <tt>-j<i>num_processes</i></tt> option determines number of parallel * instances to run. The build products will be found in assocated build * directories in the <tt>build</tt> directory. * instances to run. The build products will be found in assocated directories * in the <tt>build</tt> directory. * * A list of individual components which can be build can be identified using * A list of individual components which can be built can be identified using * @code make -h make help @endcode * * @subsection build_system_test Running unit tests. * @subsection build_system_test Running unit tests * Units tests can be run using the command. * @code make test ARGS=-j10 @endcode * Like the parallel build the <tt>-j<i>num_processes</i></tt> option determines * number of parallel instances to run. * the number of parallel instances to run. * * <hr> * @section build_system_dev Developer Guide Loading @@ -144,7 +147,7 @@ * The build system defines some macros for defining targets, configuring debug * options, and configuing external dependences. * * @subsubsection build_system_targets Tool targets. * @subsubsection build_system_targets Tool targets * * <hr> * <tt>add_tool_target(target lang)</tt>\n\n Loading @@ -153,7 +156,7 @@ * <tt>[in] <b>target</b></tt> The name of the target.\n * <tt>[in] <b>lang</b></tt> File extention for the target (c, cpp, f90).\n\n * Target assumes there is a source file defined as <tt>target.lang</tt>. For * instance a C++ source file named <tt>foo.cpp</tt> is configures as * instance a C++ source file named <tt>foo.cpp</tt> is configured as * @code add_tool_target(foo cpp) @endcode Loading @@ -176,8 +179,8 @@ * Register a sanitizer option.\n\n * <b>Parameters</b>\n * <tt>[in] <b>name</b></tt> The name of the sanitizer flags.\n\n * This add new for using the <tt>SANITIZE_<i>NAME</i></tt> cmake option and * add <tt>-fsanitize=<i>name</i></tt> to the command line arguments. * This adds a new cmake option <tt>SANITIZE_<i>NAME</i></tt> to add * <tt>-fsanitize=<i>name</i></tt> to the command line arguments. * <hr> * * @subsubsection build_system_project Register an external project Loading @@ -203,14 +206,14 @@ * In addition to the standard build options there are several debugging options * that can be enabled. * * @subsubsection build_system_dev_options Build system Options * @subsubsection build_system_dev_options Build System Options * <table> * <caption id="build_system_user_cmake_dev_opts">Build options for developers.</caption> * <tr><th>Option <th>Discrption * <tr><td><tt>USE_PCH</tt> <td>Use precomiled headers during computation. Most users should keep this on. * <tr><th>Option <th>Discription * <tr><td><tt>USE_PCH</tt> <td>Use precompiled headers during computation. Most users should keep this on. * <tr><td><tt>SAVE_KERNEL_SOURCE</tt> <td>Option to dump the generated compute kernel source code to disk. * <tr><td><tt>USE_INPUT_CACHE</tt> <td>Option to cache registers for the kernel arguments. * <tr><td><tt>USE_CONSTANT_CACHE</tt> <td>Option to use registers to cache constant values otherwise constanst are inlined. * <tr><td><tt>USE_CONSTANT_CACHE</tt> <td>Option to use registers to cache constant values otherwise constants are inlined. * <tr><td><tt>SHOW_USE_COUNT</tt> <td>Generates information on the number of times a register is used. * <tr><td><tt>USE_INDEX_CACHE</tt> <td>Option to use registers to cache array indicies. * <tr><th colspan="2">Sanitizer Flags Loading graph_docs/general.dox +27 −26 Original line number Diff line number Diff line Loading @@ -10,9 +10,9 @@ * <caption id="general_concepts_glossery">Glossery of terms</caption> * <tr><th>Concept <th>Definition * <tr><td><b>node</b> <td>A leaf or branch on the graph tree. * <tr><td><b>graph</b> <td>A data stucture connecting leaf nodes. * <tr><td><b>reduce</b> <td>A tranformation of the graph to remove leaf_nodes. * <tr><td><b>auto differentiation</b><td>A tranformation of the graph build derivatives. * <tr><td><b>graph</b> <td>A data stucture connecting nodes. * <tr><td><b>reduce</b> <td>A transformation of the graph to remove leaf_nodes. * <tr><td><b>auto differentiation</b><td>A transformation of the graph build derivatives. * <tr><td><b>compiler</b> <td>A tool for translating from one language to another. * <tr><td><b>JIT</b> <td>Just-in-time compile. * <tr><td><b>kernel</b> <td>A code function that runs on a batch of data. Loading @@ -25,27 +25,27 @@ * <tr><td><b>safe math</b> <td>Run time checks to avoid off normal conditions. * <tr><td><b>API</b> <td>Application programming interface. * <tr><td><b>Host</b> <td>The place where kernels are launched from. * <tr><td><b>Device</b> <td>The device side where kernels are run. * <tr><td><b>Device</b> <td>The side where kernels are run. * </table> * * <hr> * @section general_concepts_graph Graph * The graph_framework operates by building tree structure of math operations. * The graph_framework operates by building a tree structure of math operations. * For an example of building expression structures see the * @ref tutorial_expression "basic expressions tutroial". In tree form it is * easy to traverse nodes in the graph. Take the example of equation of a line. * @f{equation}{y=mx + b@f} * This equation consists of five leaf nodes. The ends of the tree are clasified * as either variables @f$x@f$ or constants @f$m,b@f$. These leaf_nodes are * connected by leaf nodes for multiply and additon operations. The ouptut * @f$y@f$ represents the entire graph of operations. * This equation consists of five nodes. The ends of the tree are clasified * as either variables @f$x@f$ or constants @f$m,b@f$. These nodes are connected * by nodes for multiply and addition operations. The output @f$y@f$ represents * the entire graph of operations. * @image{} html line_graph.png "The graph stucture for y = mx + b." * Evaluation of graphs start from the top most node in this case the @f$+@f$ * operation. Evaluation of a node is not performed until all subnodes are * evaluated starting with the left operand. Evaluation starts by recursively * evaluating the left operands until the last leaf_node is reached @f$m@f$. * evaluating the left operands until the last node is reached @f$m@f$. * @image{} html line_graph_eval1.png "" * Once @f$m@f$ the result is returned ot the @f$+@f$ then the right operand is * Once @f$m@f$ the result is returned to the @f$+@f$ then the right operand is * evaluated. * @image{} html line_graph_eval2.png "" * Evaluation is repeated until every node in the graph is evaluated. Loading @@ -54,13 +54,13 @@ * <hr> * @section general_concepts_diff Auto Differentiation * From the previous @ref general_concepts_graph "section", it was shown how * graph can be evaluated. This same evaluation can be applied to build * graphs can be evaluated. This same evaluation can be applied to build * graphs of a function derivative. For an example of taking derivatives see the * @ref tutorial_derivatives "auto differentiation tutroial". Lets say that we * want to take the derivative of @f$\frac{\partial y}{\partial x}@f$. This is * achieved by evaluating the until bottom left most leaf_node is reached. Then * a new graph is build starting with @f$\frac{\partial m}{\partial x}=0@f$. * Applying the first half of the chain rule we build a new graph for @f$0x@f$ * achieved by evaluating the until bottom left most node is reached. Then a new * graph is build starting with @f$\frac{\partial m}{\partial x}=0@f$. Applying * the first half of the chain rule we build a new graph for @f$0x@f$ * @image{} html line_graph_dydf1.png "" * Then we take the derivative of the right operand and apply the second half * of the chain rule to build a new graph for @f$0x=0@f$. Loading @@ -72,17 +72,18 @@ * @section general_concepts_reduction Reduction * The final expression for @f$\frac{\partial y}{\partial x}@f$ contains many * unnecessary nodes in the graph. Instead of building full graphs, we can * simplify and eleminate node as we build them. For instance, when the * expression @f$0x@f$ this create can be immediately reduced to a single node. * simplify and eleminate nodes as we build them. For instance, when the * expression @f$0x@f$ this created can be immediately reduce it to a single * node. * @image{} html line_graph_reduce1.png "" * Applying all possible reduction reduces the final expression to * Applying all possible reductions reduces the final expression to * @f$\frac{\partial y}{\partial x}=m@f$. * @image{} html line_graph_reduce_final.png "" * By reducing graphs as they are build, we can eliminate nodes one by one. * * <hr> * @section general_concepts_compile Compile * Once graph expressions are build, they can be compiled to a compute kernel. * Once graph expressions are built, they can be compiled to a compute kernel. * For an example of compiling expression trees into kernels see the * @ref tutorial_workflow "workflow tutroial". * Using the same recursive evaluation, we can visit each node of a graph and Loading @@ -92,9 +93,9 @@ * be genereted from multiple outputs and maps. * * @subsection general_concepts_compile_inputs Inputs * Inputs are the varible nodes that define the graph. In the line example * @f$\frac{\partial y}{\partial x}@f$, the input variable would be the node * for @f$x@f$. Some graphs have no inputs. The graph for * Inputs are the variable nodes that define the graph inputs. In the line * example @f$\frac{\partial y}{\partial x}@f$, the input variable would be the * node for @f$x@f$. Some graphs have no inputs. The graph for * @f$\frac{\partial y}{\partial x}=m@f$ has eliminated all the variable nodes * in the graph. * Loading @@ -106,8 +107,8 @@ * are never stored. * * @subsection general_concepts_compile_maps Maps * Maps enable the results of an output node to stored in an input node. This is * use for a wide varity of steps. For instance take a gradient decent step. * Maps enable the results of an output node to be stored in an input node. This * is use for a wide varity of steps. For instance take a gradient decent step. * @f{equation}{y = y + \frac{\partial f}{\partial x}@f} * In this case the output of the expression * @f$y + \frac{\partial f}{\partial x}@f$ Loading @@ -120,7 +121,7 @@ * * <hr> * @section general_concepts_safe_math Safe Math * There are some conditions where mathematically, a graph should evaluate to * There are some conditions where mathematically, a graph should evaluate to a * normal number. However, when evaluted suing floating point precison, can lead * to <tt>Inf</tt> or <tt>NaN</tt>. An example of this the * @f$\exp\left(x\right)@f$ function. For large argument values, Loading graph_docs/main.dox +7 −8 Original line number Diff line number Diff line Loading @@ -3,11 +3,10 @@ * @tableofcontents * @section introduction Introduction * The <a href="https://github.com/ORNL-Fusion/graph_framework">graph_framework</a> * is a domain specific compiler for translating physics * equations to optimized code that run a on a GPUs and CPUs. The domain * specific aspect limits this to classes of problems where the same physics is * applied to a ensemble. Eamples include RF Ray tracing, particle pushing, and * field line following. * is a domain specific compiler for translating physics equations to optimized * code that runs on GPUs and CPUs. The domain specific aspect limits this to * classes of problems where the same physics is applied to an ensemble. Eamples * include RF Ray tracing, particle pushing, and field line following. * * @subsection purpose Purpose * The purpose of this framework is to enable domain scientists to write code Loading @@ -15,15 +14,15 @@ * * This framework enables: * * Portability to Nvidia, AMD, and Apple GPUs and CPUs. * * Abstract physcis from the compute. * * Abstraction of the physics from the compute. * * Enable Auto Differentiation. * * Enable easy embedding in C, C++, and Fortran codes. * * <hr> * @section tools User guides for tools * @subsection rf_ray_tracing RF Ray tracing * This section covers user guides to run the RF Ray traceing code. To run this * code, a use selects an equilibrium, a wave distribution function, a solver * This section covers user guides to run the RF Ray tracing code. To run this * code, a user selects an equilibrium, a wave distribution function, a solver * method, intial ray conditions, and a power obsorption model. To run an * example follow the instructions for the @ref xrays_commandline_example. * Loading graph_docs/tutorial.dox +19 −18 Original line number Diff line number Diff line /*! * @page tutorial Tutorial * @brief Hands on tutorial for building expressions and running workflow. * @brief Hands on tutorial for building expressions and running workflows. * @tableofcontents * * @section tutorial_introduction Introduction * In this tutorial we will put the basic @ref general_concepts of the * graph_into action. This will discuss building trees, generating, and * executing workflows. * graph_framework into action. This will discuss building trees, generating * kernels, and executing workflows. * * To accomplish this there is a playground tool in the <tt>graph_framework</tt> * directory. This playground is a preconfigured executable target which can be Loading @@ -27,8 +27,8 @@ int main(int argc, const char * argv[]) { } @endcode * To start, create a template function above main and call that function from * main. This will allow to play with different floating point types. For now we * will start with a simple floating point type. * main. This will allow us to play with different floating point types. For now * we will start with a simple floating point type. * @code #include "../graph_framework/jit.hpp" Loading @@ -47,9 +47,10 @@ int main(int argc, const char * argv[]) { END_GPU } @endcode * Here @ref jit::float_scalar is a C++20 * Here @ref jit::float_scalar is a * <a href="https://www.cppreference.com/w/cpp/20.html">C++20</a> * <a href="https://www.cppreference.com/w/cpp/concepts.html">Concept</a> for * valid types of floating point types allowed by the framework. * valid floating point types allowed by the framework. * * <hr> * @section tutorial_basic Basic Nodes Loading Loading @@ -97,7 +98,7 @@ void run_tutorial() { } @endcode * An explicit @ref graph::constant_node is create for <tt>m</tt> while an * impicit constant was defined for <tt>b</tt>. Node in the implicit case, the * impicit constant was defined for <tt>b</tt>. Note in the implicit case, the * actual node for <tt>b</tt> is not created until we use it in an expression. * * <hr> Loading Loading @@ -147,19 +148,19 @@ void run_tutorial() { } @endcode * Here we take derivatives using the @ref graph::leaf_node::df method. We can * also take several variation of this. * also take several variations of this. * @code auto dydm = y->df(m); auto dydy = y->df(y); auto dydb = y->df(m*x); @endcode * The respective results will be @f$x@f$, @f$1@f$, and @f$0@f$ respectively. * The results will be @f$x@f$, @f$1@f$, and @f$0@f$ respectively. * * <hr> * @section tutorial_workflow Making workflows. * In this section we will build workflow from these nodes we created. For * simplicity we will decrease the number of elements in variable so we can set * the values easier. First thing we do is create a @ref workflow::manager. * In this section we will build a workflow from these nodes we created. For * simplicity we will decrease the number of elements in the variable so we can * set the values easier. First thing we do is create a @ref workflow::manager. * @code template<jit::float_scalar T> void run_tutorial() { Loading Loading @@ -270,7 +271,7 @@ work.print(2, {x, y, dydx}); @endcode * * @subsection tutorial_workflow_iter Iteration * In this section we are going to make use of maps to iterate a variable. I * In this section we are going to make use of maps to iterate a variable. We * want to evaluate the value of @f$y@f$ and set it as the new value of @f$x@f$. * We do this my modifying call to @ref workflow::manager::add_item to define * a map. This generates a kernel where after @f$y@f$ is computed it is stored Loading Loading @@ -306,11 +307,11 @@ for (size_t i = 0; i < 10; i++) { @endcode * * <hr> * @section tutorial_workflow_newton Newtons Method. * @section tutorial_workflow_newton Newton's Method. * In this tutorial we are going to show how we can put all these concepts * together to impliment a newtons method. Newtons method if defined as * together to implement a newtons method. Newton's method is defined as * @f{equation}{x = x - \frac{f\left(x\right)}{\frac{\partial}{\partial x}f\left(x\right)}@f} * From the iteration example, it's step update can be handled by a simple map. * From the iteration example, its step update can be handled by a simple map. * However we need a measure for convergence. To do that we output the value of * @f$f\left(x\right)@f$. Lets setup a test function. * @code Loading Loading @@ -366,7 +367,7 @@ void run_tutorial() { @endcode * * However there are some things that are not optimial here. We are performing * a reduction on the host side and transfer the entire array to the host. To * a reduction on the host side and transfering the entire array to the host. To * improve this we can use a converge item instead. * @code // Create a workflow manager. Loading Loading
graph_c_binding/graph_c_binding.h +2 −2 Original line number Diff line number Diff line Loading @@ -9,7 +9,7 @@ /// /// @section graph_c_binding_into Introduction /// This section assumes the reader is already familar with developing C codes. /// The simplist method to link framework code into a C code is to create a c++ /// The simplist method to link framework code into a C code is to create a C++ /// function with @code extern "C" @endcode First create a header file /// <tt><i>c_callable</i>.h</tt> /// @code Loading @@ -19,7 +19,7 @@ /// @endcode /// /// Next create a source file <tt><i>c_callable</i>.c</tt> and add the /// framework. This example uses the line /// framework. This example uses the equation of a line example from the /// @ref tutorial_workflow "making workflows" turorial. /// @code /// // Include the necessary framework headers. Loading
graph_docs/compiling.dox +40 −37 Original line number Diff line number Diff line Loading @@ -9,15 +9,18 @@ * * <hr> * @section build_system_user User Guide * The following section is for users of framework. * The following section is for users of the framework. * * @subsection build_system_user_dependencies Dependencies * The graph_framwork requires three requires external dependencies and one * optional dependency. <a href="https://llvm.org">LLVM</a> is another * dependency that is used for generating CPU code. However this is * automatically obtained via the build system. The graph_frame is written using * the C++20 standard. The C interface using C17 and the fortran interface using * Fortran 2008. * The graph_framwork requires three external dependencies and one optional * dependency. <a href="https://llvm.org">LLVM</a> is another dependency that is * used for generating CPU code. However this is automatically obtained via the * build system. The graph_frame is written using the * <a href="https://www.cppreference.com/w/cpp/20.html">C++20</a> standard. The * C interface uses * <a href="https://www.cppreference.com/w/cpp/compiler_support/17.html">C17</a> * and the fortran interface uses * <a href="https://fortranwiki.org/fortran/show/Fortran+2008">Fortran 2008</a>. * * @subsubsection build_system_user_dependencies_required Required * * <a href="http://www.cmake.org">cmake</a> version greater than 3.21. Loading @@ -28,7 +31,7 @@ * * <a href="https://www.doxygen.nl/index.html">Doxygen</a> for generating this documentation. * * @subsection build_system_clone Obtaining the code * The framework code itself be obtained from the * The framework code itself can be obtained from the * <a href="https://github.com/ORNL-Fusion/graph_framework">graph_framework</a> * Github repository. * @code Loading @@ -53,7 +56,7 @@ * Where <tt>../</tt> points to the source directory containing the top * level <tt>CMakeLists.txt</tt> file. * * The recommended method is to use the interatice <tt>ccmake</tt> command * The recommended method is to use the interactive <tt>ccmake</tt> command * instead. * @code ccmake ../ Loading @@ -62,10 +65,10 @@ * <tt>-D</tt> option. * * @subsubsection build_system_user_options Build system Options * Initally, there will be no options. Along the botton, there are several * command. Use the 'c' command to start the configuation process. Once * Initially, there will be no options. Along the botton, there are several * commands. Use the 'c' command to start the configuation process. Once * configured several options will apear. During this process cmake is cloning * the LLVM repository. So this step may take some time initally. Mode of the * the LLVM repository. So this step may take some time initally. Most of the * are various options for configuing LLVM and can be ignored. The important * options are listed below. * Loading @@ -78,27 +81,27 @@ * * <tt>MinSizeRel</tt> * * <tt>RelWithDebInfo</tt> * <tr><td><tt>USE_VERBOSE</tt> <td>Show verbose information about compute kernels. * <tr><td><tt>BUILD_C_BINDING</tt> <td>Generate the C langauge interface. * <tr><td><tt>BUILD_Fortran_BINDING</tt><td>Generate the Fortran language interface. * <tr><td><tt>BUILD_C_BINDING</tt> <td>Generate the @ref graph_c_binding.h "C langauge interface". * <tr><td><tt>BUILD_Fortran_BINDING</tt><td>Generate the @ref graph_fortran "Fortran language interface". * <tr><td><tt>USE_METAL</tt> <td>Enable the <a href="https://developer.apple.com/metal/">Metal</a> backend (macOS only). * <tr><td><tt>USE_CUDA</tt> <td>Enable the <a href="https://developer.nvidia.com/cuda-zone">Cuda</a> backend (Linux only). * <tr><td><tt>USE_HIP</tt> <td>Enable the <a href="https://www.amd.com/en/products/software/rocm.html">Hip</a> backend (Linux only, Hip branch). * <tr><td><tt>USE_SSH</tt> <td>Use ssh for git instead of html. * </table> * * @note macOS uses will need to change the default option for * @note macOS users will need to change the default option for * <tt>CMAKE_CXX_COMPILER</tt> to <tt>clang++</tt>. This is due to the way the * build systems determines default include directories for system libraries. * This can be accomplished using the advacned options using the <tt>t</tt> * This can be accomplished using the advacned options accessed from the <tt>t</tt> * command or setting this via the command line. * @code cmake -DCMAKE_CXX_COMPILER=clang++ ../ @endcode * * Every time an option is changed, or a new option is available, you need to * use the configure <tt>c</tt> command for changes to take affect. Once all * options are set, the a generate <tt>g</tt> options will appear. Using this * option will build a make file. * Any time an option is changed, or a new option becomes is available, you need * to use the configure <tt>c</tt> command for changes to take affect. Once all * options are set, a generate <tt>g</tt> options will appear. Using this option * will generate the Makefile. * * @subsubsection build_system_trouble_shooting Trouble Shooting. * Some times, cmake will fail to locate the NetCDF library if it is not Loading @@ -113,28 +116,28 @@ * @code make @endcode * command. Note that due build system first starts by pulling the latest * of LLVM. The build system then has to build LLVM first which can take a * while. It is recommended to use a limited parallel build. * command. Note that the build system first starts by pulling the latest * revision of LLVM. The build system then has to build LLVM first which can * take a while. It is recommended to use a limited parallel build. * @code make -j10 @endcode * The <tt>-j<i>num_processes</i></tt> option determines number of parallel * instances to run. The build products will be found in assocated build * directories in the <tt>build</tt> directory. * instances to run. The build products will be found in assocated directories * in the <tt>build</tt> directory. * * A list of individual components which can be build can be identified using * A list of individual components which can be built can be identified using * @code make -h make help @endcode * * @subsection build_system_test Running unit tests. * @subsection build_system_test Running unit tests * Units tests can be run using the command. * @code make test ARGS=-j10 @endcode * Like the parallel build the <tt>-j<i>num_processes</i></tt> option determines * number of parallel instances to run. * the number of parallel instances to run. * * <hr> * @section build_system_dev Developer Guide Loading @@ -144,7 +147,7 @@ * The build system defines some macros for defining targets, configuring debug * options, and configuing external dependences. * * @subsubsection build_system_targets Tool targets. * @subsubsection build_system_targets Tool targets * * <hr> * <tt>add_tool_target(target lang)</tt>\n\n Loading @@ -153,7 +156,7 @@ * <tt>[in] <b>target</b></tt> The name of the target.\n * <tt>[in] <b>lang</b></tt> File extention for the target (c, cpp, f90).\n\n * Target assumes there is a source file defined as <tt>target.lang</tt>. For * instance a C++ source file named <tt>foo.cpp</tt> is configures as * instance a C++ source file named <tt>foo.cpp</tt> is configured as * @code add_tool_target(foo cpp) @endcode Loading @@ -176,8 +179,8 @@ * Register a sanitizer option.\n\n * <b>Parameters</b>\n * <tt>[in] <b>name</b></tt> The name of the sanitizer flags.\n\n * This add new for using the <tt>SANITIZE_<i>NAME</i></tt> cmake option and * add <tt>-fsanitize=<i>name</i></tt> to the command line arguments. * This adds a new cmake option <tt>SANITIZE_<i>NAME</i></tt> to add * <tt>-fsanitize=<i>name</i></tt> to the command line arguments. * <hr> * * @subsubsection build_system_project Register an external project Loading @@ -203,14 +206,14 @@ * In addition to the standard build options there are several debugging options * that can be enabled. * * @subsubsection build_system_dev_options Build system Options * @subsubsection build_system_dev_options Build System Options * <table> * <caption id="build_system_user_cmake_dev_opts">Build options for developers.</caption> * <tr><th>Option <th>Discrption * <tr><td><tt>USE_PCH</tt> <td>Use precomiled headers during computation. Most users should keep this on. * <tr><th>Option <th>Discription * <tr><td><tt>USE_PCH</tt> <td>Use precompiled headers during computation. Most users should keep this on. * <tr><td><tt>SAVE_KERNEL_SOURCE</tt> <td>Option to dump the generated compute kernel source code to disk. * <tr><td><tt>USE_INPUT_CACHE</tt> <td>Option to cache registers for the kernel arguments. * <tr><td><tt>USE_CONSTANT_CACHE</tt> <td>Option to use registers to cache constant values otherwise constanst are inlined. * <tr><td><tt>USE_CONSTANT_CACHE</tt> <td>Option to use registers to cache constant values otherwise constants are inlined. * <tr><td><tt>SHOW_USE_COUNT</tt> <td>Generates information on the number of times a register is used. * <tr><td><tt>USE_INDEX_CACHE</tt> <td>Option to use registers to cache array indicies. * <tr><th colspan="2">Sanitizer Flags Loading
graph_docs/general.dox +27 −26 Original line number Diff line number Diff line Loading @@ -10,9 +10,9 @@ * <caption id="general_concepts_glossery">Glossery of terms</caption> * <tr><th>Concept <th>Definition * <tr><td><b>node</b> <td>A leaf or branch on the graph tree. * <tr><td><b>graph</b> <td>A data stucture connecting leaf nodes. * <tr><td><b>reduce</b> <td>A tranformation of the graph to remove leaf_nodes. * <tr><td><b>auto differentiation</b><td>A tranformation of the graph build derivatives. * <tr><td><b>graph</b> <td>A data stucture connecting nodes. * <tr><td><b>reduce</b> <td>A transformation of the graph to remove leaf_nodes. * <tr><td><b>auto differentiation</b><td>A transformation of the graph build derivatives. * <tr><td><b>compiler</b> <td>A tool for translating from one language to another. * <tr><td><b>JIT</b> <td>Just-in-time compile. * <tr><td><b>kernel</b> <td>A code function that runs on a batch of data. Loading @@ -25,27 +25,27 @@ * <tr><td><b>safe math</b> <td>Run time checks to avoid off normal conditions. * <tr><td><b>API</b> <td>Application programming interface. * <tr><td><b>Host</b> <td>The place where kernels are launched from. * <tr><td><b>Device</b> <td>The device side where kernels are run. * <tr><td><b>Device</b> <td>The side where kernels are run. * </table> * * <hr> * @section general_concepts_graph Graph * The graph_framework operates by building tree structure of math operations. * The graph_framework operates by building a tree structure of math operations. * For an example of building expression structures see the * @ref tutorial_expression "basic expressions tutroial". In tree form it is * easy to traverse nodes in the graph. Take the example of equation of a line. * @f{equation}{y=mx + b@f} * This equation consists of five leaf nodes. The ends of the tree are clasified * as either variables @f$x@f$ or constants @f$m,b@f$. These leaf_nodes are * connected by leaf nodes for multiply and additon operations. The ouptut * @f$y@f$ represents the entire graph of operations. * This equation consists of five nodes. The ends of the tree are clasified * as either variables @f$x@f$ or constants @f$m,b@f$. These nodes are connected * by nodes for multiply and addition operations. The output @f$y@f$ represents * the entire graph of operations. * @image{} html line_graph.png "The graph stucture for y = mx + b." * Evaluation of graphs start from the top most node in this case the @f$+@f$ * operation. Evaluation of a node is not performed until all subnodes are * evaluated starting with the left operand. Evaluation starts by recursively * evaluating the left operands until the last leaf_node is reached @f$m@f$. * evaluating the left operands until the last node is reached @f$m@f$. * @image{} html line_graph_eval1.png "" * Once @f$m@f$ the result is returned ot the @f$+@f$ then the right operand is * Once @f$m@f$ the result is returned to the @f$+@f$ then the right operand is * evaluated. * @image{} html line_graph_eval2.png "" * Evaluation is repeated until every node in the graph is evaluated. Loading @@ -54,13 +54,13 @@ * <hr> * @section general_concepts_diff Auto Differentiation * From the previous @ref general_concepts_graph "section", it was shown how * graph can be evaluated. This same evaluation can be applied to build * graphs can be evaluated. This same evaluation can be applied to build * graphs of a function derivative. For an example of taking derivatives see the * @ref tutorial_derivatives "auto differentiation tutroial". Lets say that we * want to take the derivative of @f$\frac{\partial y}{\partial x}@f$. This is * achieved by evaluating the until bottom left most leaf_node is reached. Then * a new graph is build starting with @f$\frac{\partial m}{\partial x}=0@f$. * Applying the first half of the chain rule we build a new graph for @f$0x@f$ * achieved by evaluating the until bottom left most node is reached. Then a new * graph is build starting with @f$\frac{\partial m}{\partial x}=0@f$. Applying * the first half of the chain rule we build a new graph for @f$0x@f$ * @image{} html line_graph_dydf1.png "" * Then we take the derivative of the right operand and apply the second half * of the chain rule to build a new graph for @f$0x=0@f$. Loading @@ -72,17 +72,18 @@ * @section general_concepts_reduction Reduction * The final expression for @f$\frac{\partial y}{\partial x}@f$ contains many * unnecessary nodes in the graph. Instead of building full graphs, we can * simplify and eleminate node as we build them. For instance, when the * expression @f$0x@f$ this create can be immediately reduced to a single node. * simplify and eleminate nodes as we build them. For instance, when the * expression @f$0x@f$ this created can be immediately reduce it to a single * node. * @image{} html line_graph_reduce1.png "" * Applying all possible reduction reduces the final expression to * Applying all possible reductions reduces the final expression to * @f$\frac{\partial y}{\partial x}=m@f$. * @image{} html line_graph_reduce_final.png "" * By reducing graphs as they are build, we can eliminate nodes one by one. * * <hr> * @section general_concepts_compile Compile * Once graph expressions are build, they can be compiled to a compute kernel. * Once graph expressions are built, they can be compiled to a compute kernel. * For an example of compiling expression trees into kernels see the * @ref tutorial_workflow "workflow tutroial". * Using the same recursive evaluation, we can visit each node of a graph and Loading @@ -92,9 +93,9 @@ * be genereted from multiple outputs and maps. * * @subsection general_concepts_compile_inputs Inputs * Inputs are the varible nodes that define the graph. In the line example * @f$\frac{\partial y}{\partial x}@f$, the input variable would be the node * for @f$x@f$. Some graphs have no inputs. The graph for * Inputs are the variable nodes that define the graph inputs. In the line * example @f$\frac{\partial y}{\partial x}@f$, the input variable would be the * node for @f$x@f$. Some graphs have no inputs. The graph for * @f$\frac{\partial y}{\partial x}=m@f$ has eliminated all the variable nodes * in the graph. * Loading @@ -106,8 +107,8 @@ * are never stored. * * @subsection general_concepts_compile_maps Maps * Maps enable the results of an output node to stored in an input node. This is * use for a wide varity of steps. For instance take a gradient decent step. * Maps enable the results of an output node to be stored in an input node. This * is use for a wide varity of steps. For instance take a gradient decent step. * @f{equation}{y = y + \frac{\partial f}{\partial x}@f} * In this case the output of the expression * @f$y + \frac{\partial f}{\partial x}@f$ Loading @@ -120,7 +121,7 @@ * * <hr> * @section general_concepts_safe_math Safe Math * There are some conditions where mathematically, a graph should evaluate to * There are some conditions where mathematically, a graph should evaluate to a * normal number. However, when evaluted suing floating point precison, can lead * to <tt>Inf</tt> or <tt>NaN</tt>. An example of this the * @f$\exp\left(x\right)@f$ function. For large argument values, Loading
graph_docs/main.dox +7 −8 Original line number Diff line number Diff line Loading @@ -3,11 +3,10 @@ * @tableofcontents * @section introduction Introduction * The <a href="https://github.com/ORNL-Fusion/graph_framework">graph_framework</a> * is a domain specific compiler for translating physics * equations to optimized code that run a on a GPUs and CPUs. The domain * specific aspect limits this to classes of problems where the same physics is * applied to a ensemble. Eamples include RF Ray tracing, particle pushing, and * field line following. * is a domain specific compiler for translating physics equations to optimized * code that runs on GPUs and CPUs. The domain specific aspect limits this to * classes of problems where the same physics is applied to an ensemble. Eamples * include RF Ray tracing, particle pushing, and field line following. * * @subsection purpose Purpose * The purpose of this framework is to enable domain scientists to write code Loading @@ -15,15 +14,15 @@ * * This framework enables: * * Portability to Nvidia, AMD, and Apple GPUs and CPUs. * * Abstract physcis from the compute. * * Abstraction of the physics from the compute. * * Enable Auto Differentiation. * * Enable easy embedding in C, C++, and Fortran codes. * * <hr> * @section tools User guides for tools * @subsection rf_ray_tracing RF Ray tracing * This section covers user guides to run the RF Ray traceing code. To run this * code, a use selects an equilibrium, a wave distribution function, a solver * This section covers user guides to run the RF Ray tracing code. To run this * code, a user selects an equilibrium, a wave distribution function, a solver * method, intial ray conditions, and a power obsorption model. To run an * example follow the instructions for the @ref xrays_commandline_example. * Loading
graph_docs/tutorial.dox +19 −18 Original line number Diff line number Diff line /*! * @page tutorial Tutorial * @brief Hands on tutorial for building expressions and running workflow. * @brief Hands on tutorial for building expressions and running workflows. * @tableofcontents * * @section tutorial_introduction Introduction * In this tutorial we will put the basic @ref general_concepts of the * graph_into action. This will discuss building trees, generating, and * executing workflows. * graph_framework into action. This will discuss building trees, generating * kernels, and executing workflows. * * To accomplish this there is a playground tool in the <tt>graph_framework</tt> * directory. This playground is a preconfigured executable target which can be Loading @@ -27,8 +27,8 @@ int main(int argc, const char * argv[]) { } @endcode * To start, create a template function above main and call that function from * main. This will allow to play with different floating point types. For now we * will start with a simple floating point type. * main. This will allow us to play with different floating point types. For now * we will start with a simple floating point type. * @code #include "../graph_framework/jit.hpp" Loading @@ -47,9 +47,10 @@ int main(int argc, const char * argv[]) { END_GPU } @endcode * Here @ref jit::float_scalar is a C++20 * Here @ref jit::float_scalar is a * <a href="https://www.cppreference.com/w/cpp/20.html">C++20</a> * <a href="https://www.cppreference.com/w/cpp/concepts.html">Concept</a> for * valid types of floating point types allowed by the framework. * valid floating point types allowed by the framework. * * <hr> * @section tutorial_basic Basic Nodes Loading Loading @@ -97,7 +98,7 @@ void run_tutorial() { } @endcode * An explicit @ref graph::constant_node is create for <tt>m</tt> while an * impicit constant was defined for <tt>b</tt>. Node in the implicit case, the * impicit constant was defined for <tt>b</tt>. Note in the implicit case, the * actual node for <tt>b</tt> is not created until we use it in an expression. * * <hr> Loading Loading @@ -147,19 +148,19 @@ void run_tutorial() { } @endcode * Here we take derivatives using the @ref graph::leaf_node::df method. We can * also take several variation of this. * also take several variations of this. * @code auto dydm = y->df(m); auto dydy = y->df(y); auto dydb = y->df(m*x); @endcode * The respective results will be @f$x@f$, @f$1@f$, and @f$0@f$ respectively. * The results will be @f$x@f$, @f$1@f$, and @f$0@f$ respectively. * * <hr> * @section tutorial_workflow Making workflows. * In this section we will build workflow from these nodes we created. For * simplicity we will decrease the number of elements in variable so we can set * the values easier. First thing we do is create a @ref workflow::manager. * In this section we will build a workflow from these nodes we created. For * simplicity we will decrease the number of elements in the variable so we can * set the values easier. First thing we do is create a @ref workflow::manager. * @code template<jit::float_scalar T> void run_tutorial() { Loading Loading @@ -270,7 +271,7 @@ work.print(2, {x, y, dydx}); @endcode * * @subsection tutorial_workflow_iter Iteration * In this section we are going to make use of maps to iterate a variable. I * In this section we are going to make use of maps to iterate a variable. We * want to evaluate the value of @f$y@f$ and set it as the new value of @f$x@f$. * We do this my modifying call to @ref workflow::manager::add_item to define * a map. This generates a kernel where after @f$y@f$ is computed it is stored Loading Loading @@ -306,11 +307,11 @@ for (size_t i = 0; i < 10; i++) { @endcode * * <hr> * @section tutorial_workflow_newton Newtons Method. * @section tutorial_workflow_newton Newton's Method. * In this tutorial we are going to show how we can put all these concepts * together to impliment a newtons method. Newtons method if defined as * together to implement a newtons method. Newton's method is defined as * @f{equation}{x = x - \frac{f\left(x\right)}{\frac{\partial}{\partial x}f\left(x\right)}@f} * From the iteration example, it's step update can be handled by a simple map. * From the iteration example, its step update can be handled by a simple map. * However we need a measure for convergence. To do that we output the value of * @f$f\left(x\right)@f$. Lets setup a test function. * @code Loading Loading @@ -366,7 +367,7 @@ void run_tutorial() { @endcode * * However there are some things that are not optimial here. We are performing * a reduction on the host side and transfer the entire array to the host. To * a reduction on the host side and transfering the entire array to the host. To * improve this we can use a converge item instead. * @code // Create a workflow manager. Loading