Seer is an intelligent system for extreme heterogeneous architectures
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Simple test of a HIP implementation's ability of kernels to accept an unused object reference.
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Strategies to distribute simplex-shaped workload across thousands of GPUs through mathematical mapping and dynamic scheduling
PyTorch-based large-scale ptychography for determining atom trajectories
Repository of GPU performance optimization
Mirror of https://github.com/celeritas-project/celeritas
VTK-m mirror which runs in OLCF CI
This project isolates an issue related to CUDA separable compilation that occurs in legacy CMake with CUDA as a TPL. The issue is fixed when using modern CMake with CUDA enabled as a language.
This repository contains example OpenACC programs to test the OpenARC compiler.
Simple tester for multi-architecture domain decomposed particle transport
Simple "Hello World" type program used to test the layout of resources on a Summit node using jsrun.
ELM Kernel Library