Loading
[mlir][NVGPUToNVVM] Support BF16 mma.sync lowering (#194203)
Let NVGPUToNVVM to recognize BF16 MMA operand element types
Pack `vector<2xbf16>` fragments to `i32` before emitting
`nvvm.mma.sync`.
This matches the PTX operand encoding for `m16n8k16` BF16 MMA
instructions.
Add a conversion test for `nvgpu.mma.sync` `bf16xbf16` to `f32`
lowering.
Co-authored-by:
Hao Ren <rhao8608@gmail.com>