Adaptive Sparse Grid Discretization solver.
Mirror of https://github.com/celeritas-project/celeritas
Simple tester for multi-architecture domain decomposed particle transport
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Strategies to distribute simplex-shaped workload across thousands of GPUs through mathematical mapping and dynamic scheduling
ELM Kernel Library
Repository of GPU performance optimization
Simple "Hello World" type program used to test the layout of resources on a Summit node using jsrun.
Simple test of a HIP implementation's ability of kernels to accept an unused object reference.
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Memory manager with events that can handled delayed allocations and deallocations in multiple memory spaces.