Simple test of a HIP implementation's ability of kernels to accept an unused object reference.
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Repository of GPU performance optimization
Mirror of https://github.com/celeritas-project/celeritas
This project isolates an issue related to CUDA separable compilation that occurs in legacy CMake with CUDA as a TPL. The issue is fixed when using modern CMake with CUDA enabled as a language.
This repository contains example OpenACC programs to test the OpenARC compiler.
Simple tester for multi-architecture domain decomposed particle transport
Simple "Hello World" type program used to test the layout of resources on a Summit node using jsrun.
Python Package for Electron Scattering/Microscopy Simulations with MPI + JIT-compiled CUDA C/C++
Adaptive Sparse Grid Discretization solver.
Miniapp version of ray-trace code used to simulate X-Ray lasers.