Commit bfb9b479 authored by cianciosa's avatar cianciosa Committed by Cianciosa, Mark
Browse files

Add paper for JOSS submission. WIP

parent 513bea6a
Loading
Loading
Loading
Loading
+1 −1
Original line number Diff line number Diff line
@@ -307,7 +307,7 @@ find_package(Doxygen)

if (DOXYGEN_FOUND)
    set (DOXYGEN_PROJECT_NAME "Graph Framework")
    set (DOXYGEN_EXCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/LLVM ${CMAKE_CURRENT_SOURCE_DIR}/build)
    set (DOXYGEN_EXCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/LLVM ${CMAKE_CURRENT_SOURCE_DIR}/build ${CMAKE_CURRENT_SOURCE_DIR}/graph_paper)
    set (DOXYGEN_GENERATE_TREEVIEW YES)
    set (DOXYGEN_USE_MATHJAX YES)
    set (DOXYGEN_IMAGE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/graph_docs)
+12 −0
Original line number Diff line number Diff line
@@ -420,6 +420,8 @@
		C7B676092AA90243005AB34C /* CMakeLists.txt */ = {isa = PBXFileReference; lastKnownFileType = text; path = CMakeLists.txt; sourceTree = "<group>"; };
		C7B677D829E45C9500D3ADC6 /* backend.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; path = backend.hpp; sourceTree = "<group>"; };
		C7B677DA29E464AE00D3ADC6 /* cpu_context.hpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.h; path = cpu_context.hpp; sourceTree = "<group>"; };
		C7C2D6682E81C1D90017C272 /* paper.md */ = {isa = PBXFileReference; lastKnownFileType = net.daringfireball.markdown; path = paper.md; sourceTree = "<group>"; };
		C7C2D66A2E81C4B90017C272 /* paper.bib */ = {isa = PBXFileReference; lastKnownFileType = text; path = paper.bib; sourceTree = "<group>"; };
		C7CEA0042948D02A00F61D09 /* timing.hpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.h; path = timing.hpp; sourceTree = "<group>"; };
		C7CEA0052948EB0F00F61D09 /* cuda_context.hpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.h; path = cuda_context.hpp; sourceTree = "<group>"; };
		C7D12D992DBAB31F00925420 /* random.hpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.h; path = random.hpp; sourceTree = "<group>"; };
@@ -711,6 +713,7 @@
				C78F3D892DC122B1002E3D94 /* graph_korc */,
				C723737C2F5F6707005A5C62 /* graph_pic */,
				C75C42942E5CA60B00B0950B /* graph_docs */,
				C7C2D6692E81C1D90017C272 /* graph_paper */,
				C7167B212AC5CE8500E03131 /* utilities */,
				C717CB8C2A02E361008FBDD8 /* cmake */,
				C79141A722DA9BF200E0BA0D /* Products */,
@@ -823,6 +826,15 @@
			path = graph_fortran_binding;
			sourceTree = "<group>";
		};
		C7C2D6692E81C1D90017C272 /* graph_paper */ = {
			isa = PBXGroup;
			children = (
				C7C2D6682E81C1D90017C272 /* paper.md */,
				C7C2D66A2E81C4B90017C272 /* paper.bib */,
			);
			path = graph_paper;
			sourceTree = "<group>";
		};
		C7DC9EE32E39768300524F6F /* graph_c_binding */ = {
			isa = PBXGroup;
			children = (

graph_paper/paper.bib

0 → 100644
+373 −0

File added.

Preview size limit exceeded, changes collapsed.

graph_paper/paper.md

0 → 100644
+135 −0
Original line number Diff line number Diff line
---
title: 'graph_framework: A Domain Specific Compiler for Building Physics Applications'
tags:
    - C++
    - Autodifferentation
    - GPU
    - RF Ray Tracing
    - Energenic particles
authors:
    - name: M. Cianciosa
      orcid: 0000-0001-6211-5311
      affiliation: "1"
    - name: D. Batchelor
      orcid: 0009-0000-2669-9292
      affiliation: "2"
    - name: W. Elwasif
      orcid: 0000-0003-0554-1036
      affiliation: "1"
affiliations:
    - name: Oak Ridge National Laboratory
      index: 1
    - name: Diditco, Oak Ridge TN 37831
      index: 2
date: 22 Sepember 2025
bibliography: paper.bib
---

# Summary[^1]

Modern supercomputers are increasingly relying on Graphic Processing Units (GPUs) 
and other accelerators to achieve exa-scale performance at reasonable energy 
usage. The challenge of exploiting these accelerators is the incompatibility 
between different vendors. A scientific code written using CUDA will not operate 
on a AMD gpu. Frameworks that can abstract the physics from the accelerator 
kernel code are needed to exploit the current and future hardware. In world of 
machine learning, several auto differentiation frameworks have been developed 
that have the promise of abstracting the math from the compute hardware. However 
in practice, these framework often lag in supporting non-CUDA platforms. Their 
reliance on python makes them challenging to embed within non python based 
applications. In this paper we present the development of a graph computation 
framework which compiles physics equations to optimized kernel code for the 
central processing unit (CPUs), Apple GPUs, and NVidia GPUs. The utility of this 
framework will be demonstrated for a Radio Frequency (RF) ray tracing problems 
in fusion energy.

[^1]:Notice of Copyright This manuscript has been authored by UT-Battelle, LLC 
under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The 
United States Government retains and the publisher, by accepting the article for 
publication, acknowledges that the United States Government retains a 
non-exclusive, paidup, irrevocable, world-wide license to publish or reproduce 
the published form of this manuscript, or allow others to do so, for United 
States Government purposes. The Department of Energy will provide public access 
to these results of federally sponsored research in accordance with the DOE 
Public Access Plan ([http://energy.gov/downloads/doe-public-access-plan](http://energy.gov/downloads/doe-public-access-plan)).

# Statement of need

GPUs offer a offer tremendous processing power that is largly untapped codes 
developed by domain scientists. The goal of the graph_framework is to lower the 
barrier of entry for adopting GPU code. While there are many different solutions
to the problem of performance portable code, different solutions have different
drawbacks or trade offs. With that in mind the graph_framework was developed to
address the specific capabilities of:

- Transparently support multiple CPUs and GPUs including Apple GPUs.
- Use an API that is as simple as writting equations.
- Allow easy embedding in legacy code (Doesn't rely on python).
- Enables automatic differentiation.

With these design goals in mind this framework is limited to the classes of 
problems which the same physics is applied to a large ensemble of particles.
This limitation simplifies the complexity of this framework making future 
extensibility simpler as a need arises for a new problem domain.

# Background

| Framework       | Languauge          | Cuda Support       | Metal Support        | RocM Support       | Auto Differentation |
|:---------------:|:------------------:|:------------------:|:--------------------:|:------------------:|:-------------------:|
| graph_framework | C++, C, Fortran    | Offical            | Offical              | Preliminary        | Yes                 |
|-----------------|--------------------|--------------------|----------------------|--------------------|---------------------|
| Cuda            | C                  | Offical            | None                 | None               | No                  |
| Metal           | Objective C, Swift | None               | Offical              | Depricated         | No                  |
| Kokkos          | C++                | Offical            | None                 | Offical            | No                  |
| OpenACC         | C, C++, Fortran    | Offical            | None                 | None               | No                  |
| OpenMP          | C, C++, Fortran    | Compiler Dependent | None                 | Compiler Dependent | No                  |
| OpenCL          | C                  | Offical            | Depricated           | Offical            | No                  |
| Vulcan          | C                  | Offical            | Unoffical            | Offical            | No                  |
| HIP             | C                  | Offical            | None                 | Offical            | No                  |
| tensorflow      | Python, C++        | Offical            | Unoffical/Incomplete | Unoffical          | Yes                 |
| JAX             | Python             | Offical            | Unoffical/Incomplete | Offical            | Yes                 |
| pytorch         | Python, C++, Java  | Offical            | Offical              | Offical            | Yes                 |
| mlx             | Python, C++, Swift | Offical            | Offical              | Experimental       | Yes                 |
Table: Overview of GPU capable frameworks. \label{frameworks}

Standardized programming languages such as Fortran[@Backus], C[@Ritchie], 
C++[@Stroustrup], have simplified the 
development if cross platform programs. Scientific codes have relied on the 
ability to write source code which can operate on multiple processor 
architectures and operating systems (OSs) with no or littel changes given an
appropriate compiler. However, modern super computers rely on graphical 
processing units (GPUs) to achieve exa-scale 
performace[@Hines],[@Yang],[@Schneider] with reasonable energy usage. Unlike 
central processing units (CPUs), the instruction sets of GPUs are proprietary 
information. Additionally, since accelerators typically are hardware 
accessories, an OS requires device drivers which are also proprietary. NVidia
GPUs are best programmed using CUDA[@Cuda] while Apple GPUs use Metal[@Metal] 
and AMD GPUs use HIP[@Hip].

There are many potential solutions to cross performance portable support. Low 
level cross platform frameworks general purpose GPU (GPGPU) programming 
frameworks such as OpenCL[@Munshi] and Vulkan[@Vulkan] requires 
direct vendor support. HIP can support NVidia GPUs by abstracting the driver API
and rewitting kernel code. However these frameworks are the lowest level and
require GPU programming expertize to utilize them effectively that a domain 
scientist may not have. A higher level approch used in 
OpenACC[@Farber] and OpenMP[@OpenMP] use source code antonation to
transform loops and code blocks into GPU kernels. The drawback of this approche
is that source code written for CPUs is results in poor GPU performance. 
Kokkos[@Edwards] is a collection of performance portable array operations for 
for building device adnostic applications.

With the advent of Machine learning, several machine learning frameworks have
been created such as Tensorflow[@Abadi], 
JAX[@Bradbury], PyTorch[@Paszke], and MLX[@Hannun]. These 
frameworks build a graph representation operations that can be 
auto-differentiated and compiled to GPUs. These frameworks are intended to be 
used through a python interface which lower the one barrier to useing but also
introduces new barriers. For instance, it's not straight forward to embed these 
frameworks in non-python codes and non-python API's don't always support all the
features or are as well documented as python API's. Addtitionally performance is
not garrentteed. It is not always straight forward to understand what the 
framework is doing. Additionally cross platform support is often unoffical and
can be incomplete. Table \ref{frameworks} shows an overview of these frameworks.

# Discription