+18
−5
Loading
Adds the following intrinsics for narrow-fp to bf16 conversions
introduced in PTX 9.2:
-
llvm.nvvm.{e4m3x2/e5m2x2}.to.bf16x2.rn{.relu}{.satfinite}.scale.n2.ue8m0
-
llvm.nvvm.{e2m3x2/e3m2x2}.to.bf16x2.rn{.relu}{.satfinite}.scale.n2.ue8m0
- llvm.nvvm.e2m1x2.to.bf16x2.rn{.relu}{.satfinite}.scale.n2.ue8m0
Tests have been verified through `ptxas-13.2`.
PTX ISA Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cvt