Commit 16ff91eb authored by Serge Pavlov's avatar Serge Pavlov
Browse files

Introduce intrinsic llvm.isnan

Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
parent 2f002817
Loading
Loading
Loading
Loading
+4 −24
Original line number Diff line number Diff line
@@ -3068,37 +3068,17 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
    // ZExt bool to int type.
    return RValue::get(Builder.CreateZExt(LHS, ConvertType(E->getType())));
  }
  case Builtin::BI__builtin_isnan: {
    CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
    Value *V = EmitScalarExpr(E->getArg(0));
    llvm::Type *Ty = V->getType();
    const llvm::fltSemantics &Semantics = Ty->getFltSemantics();
    if (!Builder.getIsFPConstrained() ||
        Builder.getDefaultConstrainedExcept() == fp::ebIgnore ||
        !Ty->isIEEE()) {
      V = Builder.CreateFCmpUNO(V, V, "cmp");
      return RValue::get(Builder.CreateZExt(V, ConvertType(E->getType())));
    }
    if (Value *Result = getTargetHooks().testFPKind(V, BuiltinID, Builder, CGM))
      return RValue::get(Result);
    // NaN has all exp bits set and a non zero significand. Therefore:
    // isnan(V) == ((exp mask - (abs(V) & exp mask)) < 0)
    unsigned bitsize = Ty->getScalarSizeInBits();
    llvm::IntegerType *IntTy = Builder.getIntNTy(bitsize);
    Value *IntV = Builder.CreateBitCast(V, IntTy);
    APInt AndMask = APInt::getSignedMaxValue(bitsize);
    Value *AbsV =
        Builder.CreateAnd(IntV, llvm::ConstantInt::get(IntTy, AndMask));
    APInt ExpMask = APFloat::getInf(Semantics).bitcastToAPInt();
    Value *Sub =
        Builder.CreateSub(llvm::ConstantInt::get(IntTy, ExpMask), AbsV);
    // V = sign bit (Sub) <=> V = (Sub < 0)
    V = Builder.CreateLShr(Sub, llvm::ConstantInt::get(IntTy, bitsize - 1));
    if (bitsize > 32)
      V = Builder.CreateTrunc(V, ConvertType(E->getType()));
    return RValue::get(V);
    Function *F = CGM.getIntrinsic(Intrinsic::isnan, V->getType());
    Value *Call = Builder.CreateCall(F, V);
    return RValue::get(Builder.CreateZExt(Call, ConvertType(E->getType())));
  }
  case Builtin::BI__builtin_matrix_transpose: {
+17 −20
Original line number Diff line number Diff line
@@ -17,7 +17,7 @@ int printf(const char *, ...);
// CHECK-NEXT:    store i32 [[X:%.*]], i32* [[X_ADDR]], align 4
// CHECK-NEXT:    [[TMP0:%.*]] = load i8*, i8** [[STR_ADDR]], align 8
// CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* [[X_ADDR]], align 4
// CHECK-NEXT:    [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) [[ATTR4:#.*]]
// CHECK-NEXT:    [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) #[[ATTR3:[0-9]+]]
// CHECK-NEXT:    ret void
//
void p(char *str, int x) {
@@ -29,13 +29,13 @@ void p(char *str, int x) {
// CHECK-LABEL: @test_long_double_isinf(
// CHECK-NEXT:  entry:
// CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
// CHECK-NEXT:    store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT:    [[SHL1:%.*]] = shl i80 [[BITCAST]], 1
// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i80 [[SHL1]], -18446744073709551616
// CHECK-NEXT:    [[RES:%.*]] = zext i1 [[CMP]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT:    [[TMP1:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT:    [[TMP2:%.*]] = shl i80 [[TMP1]], 1
// CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i80 [[TMP2]], -18446744073709551616
// CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
// CHECK-NEXT:    ret void
//
void test_long_double_isinf(long double ld) {
@@ -47,13 +47,13 @@ void test_long_double_isinf(long double ld) {
// CHECK-LABEL: @test_long_double_isfinite(
// CHECK-NEXT:  entry:
// CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
// CHECK-NEXT:    store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT:    [[SHL1:%.*]] = shl i80 [[BITCAST]], 1
// CHECK-NEXT:    [[CMP:%.*]] = icmp ult i80 [[SHL1]], -18446744073709551616
// CHECK-NEXT:    [[RES:%.*]] = zext i1 [[CMP]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT:    [[TMP1:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT:    [[TMP2:%.*]] = shl i80 [[TMP1]], 1
// CHECK-NEXT:    [[TMP3:%.*]] = icmp ult i80 [[TMP2]], -18446744073709551616
// CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
// CHECK-NEXT:    ret void
//
void test_long_double_isfinite(long double ld) {
@@ -65,14 +65,11 @@ void test_long_double_isfinite(long double ld) {
// CHECK-LABEL: @test_long_double_isnan(
// CHECK-NEXT:  entry:
// CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
// CHECK-NEXT:    store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
// CHECK-NEXT:    [[ABS:%.*]] = and i80 [[BITCAST]], 604462909807314587353087
// CHECK-NEXT:    [[TMP1:%.*]] = sub i80 604453686435277732577280, [[ABS]]
// CHECK-NEXT:    [[ISNAN:%.*]] = lshr i80 [[TMP1]], 79
// CHECK-NEXT:    [[RES:%.*]] = trunc i80 [[ISNAN]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.isnan.f80(x86_fp80 [[TMP0]]) #[[ATTR3]]
// CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.3, i64 0, i64 0), i32 [[TMP2]]) #[[ATTR3]]
// CHECK-NEXT:    ret void
//
void test_long_double_isnan(long double ld) {
+18 −20
Original line number Diff line number Diff line
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
// RUN: %clang_cc1 %s -emit-llvm -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -o - -triple arm64-none-linux-gnu | FileCheck %s

// Test that the constrained intrinsics are picking up the exception
@@ -15,7 +16,7 @@ int printf(const char *, ...);
// CHECK-NEXT:    store i32 [[X:%.*]], i32* [[X_ADDR]], align 4
// CHECK-NEXT:    [[TMP0:%.*]] = load i8*, i8** [[STR_ADDR]], align 8
// CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* [[X_ADDR]], align 4
// CHECK-NEXT:    [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]])  [[ATTR4:#.*]]
// CHECK-NEXT:    [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) #[[ATTR3:[0-9]+]]
// CHECK-NEXT:    ret void
//
void p(char *str, int x) {
@@ -27,13 +28,13 @@ void p(char *str, int x) {
// CHECK-LABEL: @test_long_double_isinf(
// CHECK-NEXT:  entry:
// CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca fp128, align 16
// CHECK-NEXT:    store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT:    [[SHL1:%.*]] = shl i128 [[BITCAST]], 1
// CHECK-NEXT:    [[CMP:%.*]] = icmp eq i128 [[SHL1]], -10384593717069655257060992658440192
// CHECK-NEXT:    [[RES:%.*]] = zext i1 [[CMP]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT:    [[TMP1:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT:    [[TMP2:%.*]] = shl i128 [[TMP1]], 1
// CHECK-NEXT:    [[TMP3:%.*]] = icmp eq i128 [[TMP2]], -10384593717069655257060992658440192
// CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
// CHECK-NEXT:    ret void
//
void test_long_double_isinf(long double ld) {
@@ -45,13 +46,13 @@ void test_long_double_isinf(long double ld) {
// CHECK-LABEL: @test_long_double_isfinite(
// CHECK-NEXT:  entry:
// CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca fp128, align 16
// CHECK-NEXT:    store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT:    [[SHL1:%.*]] = shl i128 [[BITCAST]], 1
// CHECK-NEXT:    [[CMP:%.*]] = icmp ult i128 [[SHL1]], -10384593717069655257060992658440192
// CHECK-NEXT:    [[RES:%.*]] = zext i1 [[CMP]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
// CHECK-NEXT:    [[TMP1:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT:    [[TMP2:%.*]] = shl i128 [[TMP1]], 1
// CHECK-NEXT:    [[TMP3:%.*]] = icmp ult i128 [[TMP2]], -10384593717069655257060992658440192
// CHECK-NEXT:    [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
// CHECK-NEXT:    ret void
//
void test_long_double_isfinite(long double ld) {
@@ -63,14 +64,11 @@ void test_long_double_isfinite(long double ld) {
// CHECK-LABEL: @test_long_double_isnan(
// CHECK-NEXT:  entry:
// CHECK-NEXT:    [[LD_ADDR:%.*]] = alloca fp128, align 16
// CHECK-NEXT:    store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
// CHECK-NEXT:    [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
// CHECK-NEXT:    [[ABS:%.*]] = and i128 [[BITCAST]], 170141183460469231731687303715884105727
// CHECK-NEXT:    [[TMP1:%.*]] = sub i128 170135991163610696904058773219554885632, [[ABS]]
// CHECK-NEXT:    [[ISNAN:%.*]] = lshr i128 [[TMP1]], 127
// CHECK-NEXT:    [[RES:%.*]] = trunc i128 [[ISNAN]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]])
// CHECK-NEXT:    [[TMP1:%.*]] = call i1 @llvm.isnan.f128(fp128 [[TMP0]]) #[[ATTR3]]
// CHECK-NEXT:    [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
// CHECK-NEXT:    call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.3, i64 0, i64 0), i32 [[TMP2]]) #[[ATTR3]]
// CHECK-NEXT:    ret void
//
void test_long_double_isnan(long double ld) {
+72 −80

File changed.

Preview size limit exceeded, changes collapsed.

+46 −0
Original line number Diff line number Diff line
@@ -20985,6 +20985,52 @@ return any value and uses platform-independent representation of IEEE rounding
modes.
Floating Point Test Intrinsics
------------------------------
These functions get properties of floating point values.
'``llvm.isnan``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax:
"""""""
::
      declare i1 @llvm.isnan(<fptype> <op>)
      declare <N x i1> @llvm.isnan(<vector-fptype> <op>)
Overview:
"""""""""
The '``llvm.isnan``' intrinsic returns a boolean value or vector of boolean
values depending on whether the value is NaN.
If the operand is a floating-point scalar, then the result type is a
boolean (:ref:`i1 <t_integer>`).
If the operand is a floating-point vector, then the result type is a
vector of boolean with the same number of elements as the operand.
Arguments:
""""""""""
The argument to the '``llvm.isnan``' intrinsic must be
:ref:`floating-point <t_floating>` or :ref:`vector <t_vector>`
of floating-point values.
Semantics:
""""""""""
The function tests if ``op`` is NaN. If ``op`` is a vector, then the
check is made element by element. Each test yields an :ref:`i1 <t_integer>`
result, which is ``true``, if the value is NaN. The function never raises
floating point exceptions.
General Intrinsics
------------------
Loading