Explore projects
-
Updated
-
Updated
-
Updated
-
Updated
-
Seer is an intelligent system for extreme heterogeneous architectures
Updated -
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Updated -
Simple test of a HIP implementation's ability of kernels to accept an unused object reference.
Updated -
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Updated -
Strategies to distribute simplex-shaped workload across thousands of GPUs through mathematical mapping and dynamic scheduling
Updated -
PyTorch-based large-scale ptychography for determining atom trajectories
Updated -
-
Updated
-
-
Updated
-
Updated
-
Updated
-
This project isolates an issue related to CUDA separable compilation that occurs in legacy CMake with CUDA as a TPL. The issue is fixed when using modern CMake with CUDA enabled as a language.
Updated -
This repository contains example OpenACC programs to test the OpenARC compiler.
Updated -
Simple tester for multi-architecture domain decomposed particle transport
Updated -
Simple "Hello World" type program used to test the layout of resources on a Summit node using jsrun.
Updated