This project is mirrored from https://github.com/llvm-doe-org/llvm-project.git. Pull mirroring updated .
  1. 28 Jan, 2020 2 commits
  2. 22 Jan, 2020 1 commit
    • Sander de Smalen's avatar
      Add support for (expressing) vscale. · 67d4c992
      Sander de Smalen authored
      In LLVM IR, vscale can be represented with an intrinsic. For some targets,
      this is equivalent to the constexpr:
      
        getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1
      
      This can be used to propagate the value in CodeGenPrepare.
      
      In ISel we add a node that can be legalized to one or more
      instructions to materialize the runtime vector length.
      
      This patch also adds SVE CodeGen support for VSCALE, which maps this
      node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
      instructions (scaled multiples of 2, 4, or 8 bytes, respectively).
      
      Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner
      
      Reviewed by: efriedma
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D68203
      67d4c992
  3. 16 Jan, 2020 1 commit
    • Florian Hahn's avatar
      [IR] Mark memset.* intrinsics as IntrWriteMem. · 0b21d552
      Florian Hahn authored
      llvm.memset intrinsics do only write memory, but are missing
      IntrWriteMem, so they doesNotReadMemory() returns false for them.
      
      The test change is due to the test checking the fn attribute ids at the
      call sites, which got bumped up due to a new combination with writeonly
      appearing in the test file.
      
      Reviewers: jdoerfert, reames, efriedma, nlopes, lebedev.ri
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D72789
      0b21d552
  4. 13 Jan, 2020 1 commit
  5. 08 Jan, 2020 1 commit
    • Bevin Hansson's avatar
      [Intrinsic] Add fixed point division intrinsics. · 8e2b44f7
      Bevin Hansson authored
      Summary:
      This patch adds intrinsics and ISelDAG nodes for
      signed and unsigned fixed-point division:
      
        llvm.sdiv.fix.*
        llvm.udiv.fix.*
      
      These intrinsics perform scaled division on two
      integers or vectors of integers. They are required
      for the implementation of the Embedded-C fixed-point
      arithmetic in Clang.
      
      Patch by: ebevhan
      
      Reviewers: bjope, leonardchan, efriedma, craig.topper
      
      Reviewed By: craig.topper
      
      Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D70007
      8e2b44f7
  6. 28 Dec, 2019 1 commit
  7. 18 Dec, 2019 1 commit
  8. 17 Dec, 2019 2 commits
  9. 12 Dec, 2019 1 commit
    • Florian Hahn's avatar
      [Matrix] Add first set of matrix intrinsics and initial lowering pass. · 526244b1
      Florian Hahn authored
      This is the first patch adding an initial set of matrix intrinsics and a
      corresponding lowering pass. This has been discussed on llvm-dev:
      http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html
      
      The first patch introduces four new intrinsics (transpose, multiply,
      columnwise load and store) and a LowerMatrixIntrinsics pass, that
      lowers those intrinsics to vector operations.
      
      Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
      embedded in a <16 x float> vector) and the intrinsics take the dimension
      information as parameters. Those parameters need to be ConstantInt.
      For the memory layout, we initially assume column-major, but in the RFC
      we also described how to extend the intrinsics to support row-major as
      well.
      
      For the initial lowering, we split the input of the intrinsics into a
      set of column vectors, transform those column vectors and concatenate
      the result columns to a flat result vector.
      
      This allows us to lower the intrinsics without any shape propagation, as
      mentioned in the RFC. In follow-up patches, we plan to submit the
      following improvements:
       * Shape propagation to eliminate the embedding/splitting for each
         intrinsic.
       * Fused & tiled lowering of multiply and other operations.
       * Optimization remarks highlighting matrix expressions and costs.
       * Generate loops for operations on large matrixes.
       * More general block processing for operation on large vectors,
         exploiting shape information.
      
      We would like to add dedicated transpose, columnwise load and store
      intrinsics, even though they are not strictly necessary. For example, we
      could instead emit a large shufflevector instruction instead of the
      transpose. But we expect that to
        (1) become unwieldy for larger matrixes (even for 16x16 matrixes,
            the resulting shufflevector masks would be huge),
        (2) risk instcombine making small changes, causing us to fail to
            detect the transpose, preventing better lowerings
      
      For the load/store, we are additionally planning on exploiting the
      intrinsics for better alias analysis.
      
      Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin
      
      Reviewed By: anemet
      
      Differential Revision: https://reviews.llvm.org/D70456
      526244b1
  10. 07 Dec, 2019 1 commit
    • Ulrich Weigand's avatar
      [FPEnv] Constrained FCmp intrinsics · 9db13b5a
      Ulrich Weigand authored
      This adds support for constrained floating-point comparison intrinsics.
      
      Specifically, we add:
      
            declare <ty2>
            @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
                                                metadata <condition code>,
                                                metadata <exception behavior>)
            declare <ty2>
            @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
                                                 metadata <condition code>,
                                                 metadata <exception behavior>)
      
      The first variant implements an IEEE "quiet" comparison (i.e. we only
      get an invalid FP exception if either argument is a SNaN), while the
      second variant implements an IEEE "signaling" comparison (i.e. we get
      an invalid FP exception if either argument is any NaN).
      
      The condition code is implemented as a metadata string.  The same set
      of predicates as for the fcmp instruction is supported (except for the
      "true" and "false" predicates).
      
      These new intrinsics are mapped by SelectionDAG codegen onto two new
      ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
      representing quiet vs. signaling comparison operations.  Otherwise
      those nodes look like SETCC nodes, with an additional chain argument
      and result as usual for strict FP nodes.  The patch includes support
      for the common legalization operations for those nodes.
      
      The patch also includes full SystemZ back-end support for the new
      ISD nodes, mapping them to all available SystemZ instruction to
      fully implement strict semantics (scalar and vector).
      
      Differential Revision: https://reviews.llvm.org/D69281
      9db13b5a
  11. 08 Nov, 2019 1 commit
  12. 14 Oct, 2019 1 commit
  13. 07 Oct, 2019 1 commit
  14. 02 Oct, 2019 1 commit
  15. 27 Sep, 2019 1 commit
    • Peter Collingbourne's avatar
      hwasan: Compatibility fixes for short granules. · c336557f
      Peter Collingbourne authored
      We can't use short granules with stack instrumentation when targeting older
      API levels because the rest of the system won't understand the short granule
      tags stored in shadow memory.
      
      Moreover, we need to be able to let old binaries (which won't understand
      short granule tags) run on a new system that supports short granule
      tags. Such binaries will call the __hwasan_tag_mismatch function when their
      outlined checks fail. We can compensate for the binary's lack of support
      for short granules by implementing the short granule part of the check in
      the __hwasan_tag_mismatch function. Unfortunately we can't do anything about
      inline checks, but I don't believe that we can generate these by default on
      aarch64, nor did we do so when the ABI was fixed.
      
      A new function, __hwasan_tag_mismatch_v2, is introduced that lets code
      targeting the new runtime avoid redoing the short granule check. Because tag
      mismatches are rare this isn't important from a performance perspective; the
      main benefit is that it introduces a symbol dependency that prevents binaries
      targeting the new runtime from running on older (i.e. incompatible) runtimes.
      
      Differential Revision: https://reviews.llvm.org/D68059
      
      llvm-svn: 373035
      c336557f
  16. 20 Sep, 2019 1 commit
    • Kerry McLaughlin's avatar
      [IntrinsicEmitter] Add overloaded types for SVE intrinsics (Subdivide2 & Subdivide4) · 22a8f35c
      Kerry McLaughlin authored
      Summary:
      Both match the type of another intrinsic parameter of a vector type, but where each element is subdivided to form a vector with more elements of a smaller type.
      
      Subdivide2Argument allows intrinsics such as the following to be defined:
       - declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 8 x i16>)
      
      Subdivide4Argument allows intrinsics such as:
       - declare <vscale x 4 x i32> @llvm.something.nxv4i32(<vscale x 16 x i8>)
      
      Tests are included in follow up patches which add intrinsics using these types.
      
      Reviewers: sdesmalen, SjoerdMeijer, greened, rovka
      
      Reviewed By: sdesmalen
      
      Subscribers: rovka, tschuett, jdoerfert, cfe-commits, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D67549
      
      llvm-svn: 372380
      22a8f35c
  17. 07 Sep, 2019 1 commit
    • Bjorn Pettersson's avatar
      [Intrinsic] Add the llvm.umul.fix.sat intrinsic · 5e331e4c
      Bjorn Pettersson authored
      Summary:
      Add an intrinsic that takes 2 unsigned integers with
      the scale of them provided as the third argument and
      performs fixed point multiplication on them. The
      result is saturated and clamped between the largest and
      smallest representable values of the first 2 operands.
      
      This is a part of implementing fixed point arithmetic
      in clang where some of the more complex operations
      will be implemented as intrinsics.
      
      Patch by: leonardchan, bjope
      
      Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel
      
      Reviewed By: leonardchan
      
      Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D57836
      
      llvm-svn: 371308
      5e331e4c
  18. 28 Aug, 2019 1 commit
    • Kevin P. Neal's avatar
      [FPEnv] Add fptosi and fptoui constrained intrinsics. · ddf13c00
      Kevin P. Neal authored
      This implements constrained floating point intrinsics for FP to signed and
      unsigned integers.
      
      Quoting from D32319:
      The purpose of the constrained intrinsics is to force the optimizer to
      respect the restrictions that will be necessary to support things like the
      STDC FENV_ACCESS ON pragma without interfering with optimizations when
      these restrictions are not needed.
      
      Reviewed by:	Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
      Approved by:	Craig Topper
      Differential Revision:	http://reviews.llvm.org/D63782
      
      llvm-svn: 370228
      ddf13c00
  19. 15 Aug, 2019 1 commit
    • Florian Hahn's avatar
      Add ptrmask intrinsic · de1d6c82
      Florian Hahn authored
      This patch adds a ptrmask intrinsic which allows masking out bits of a
      pointer that must be zero when accessing it, because of ABI alignment
      requirements or a restriction of the meaningful bits of a pointer
      through the data layout.
      
      This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged
      pointers) and allows us to not lose information about the underlying
      object.
      
      Reviewers: nlopes, efriedma, hfinkel, sanjoy, jdoerfert, aqjune
      
      Reviewed by: sanjoy, jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D59065
      
      llvm-svn: 368986
      de1d6c82
  20. 14 Aug, 2019 4 commits
    • David Bolvansky's avatar
      [Intrinsics] Add a 'NoAlias' intrinsic property; annotate llvm.memcpy · bb519c62
      David Bolvansky authored
      Reviewers: jdoerfert
      
      Reviewed By: jdoerfert
      
      Differential Revision: https://reviews.llvm.org/D66158
      
      llvm-svn: 368810
      bb519c62
    • John McCall's avatar
      Add intrinsics for doing frame-bound dynamic allocations within a coroutine. · 62a5dde0
      John McCall authored
      These rely on having an allocator provided to the coroutine and thus,
      for now, only work in retcon lowerings.
      
      llvm-svn: 368791
      62a5dde0
    • John McCall's avatar
      Generalize llvm.coro.suspend.retcon to allow an arbitrary number of arguments... · 38292141
      John McCall authored
      Generalize llvm.coro.suspend.retcon to allow an arbitrary number of arguments to be passed back to the continuation function.
      
      llvm-svn: 368789
      38292141
    • John McCall's avatar
      Extend coroutines to support a "returned continuation" lowering. · 94010b2b
      John McCall authored
      A quick contrast of this ABI with the currently-implemented ABI:
      
      - Allocation is implicitly managed by the lowering passes, which is fine
        for frontends that are fine with assuming that allocation cannot fail.
        This assumption is necessary to implement dynamic allocas anyway.
      
      - The lowering attempts to fit the coroutine frame into an opaque,
        statically-sized buffer before falling back on allocation; the same
        buffer must be provided to every resume point.  A buffer must be at
        least pointer-sized.
      
      - The resume and destroy functions have been combined; the continuation
        function takes a parameter indicating whether it has succeeded.
      
      - Conversely, every suspend point begins its own continuation function.
      
      - The continuation function pointer is directly returned to the caller
        instead of being stored in the frame.  The continuation can therefore
        directly destroy the frame when exiting the coroutine instead of having
        to leave it in a defunct state.
      
      - Other values can be returned directly to the caller instead of going
        through a promise allocation.  The frontend provides a "prototype"
        function declaration from which the type, calling convention, and
        attributes of the continuation functions are taken.
      
      - On the caller side, the frontend can generate natural IR that directly
        uses the continuation functions as long as it prevents IPO with the
        coroutine until lowering has happened.  In combination with the point
        above, the frontend is almost totally in charge of the ABI of the
        coroutine.
      
      - Unique-yield coroutines are given some special treatment.
      
      llvm-svn: 368788
      94010b2b
  21. 30 Jul, 2019 1 commit
    • Hideto Ueno's avatar
      [FunctionAttrs] Annotate "willreturn" for AssumeLikeInst · 6e2be4ea
      Hideto Ueno authored
      Summary:
      In D37215, AssumeLikeInstruction are regarded as `willreturn`. In this patch, annotation is added to those which don't have `willreturn` now(`sideeffect, object_size, experimental_widenable_condition`).
      
      Reviewers: jdoerfert, nikic, sstefan1
      
      Reviewed By: nikic
      
      Subscribers: hiraditya, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D65455
      
      llvm-svn: 367342
      6e2be4ea
  22. 28 Jul, 2019 1 commit
    • Hideto Ueno's avatar
      [FunctionAttrs] Annotate "willreturn" for intrinsics · cc0a4cdc
      Hideto Ueno authored
      Summary:
      In D62801, new function attribute `willreturn` was introduced. In short, a function with `willreturn` is guaranteed to come back to the call site(more precise definition is in LangRef).
      
      In this patch, willreturn is annotated for LLVM intrinsics.
      
      Reviewers: jdoerfert
      
      Reviewed By: jdoerfert
      
      Subscribers: jvesely, nhaehnle, sstefan1, llvm-commits
      
      Tags: #llvm
      
      Differential Revision: https://reviews.llvm.org/D64904
      
      llvm-svn: 367184
      cc0a4cdc
  23. 25 Jul, 2019 1 commit
  24. 22 Jul, 2019 1 commit
  25. 17 Jul, 2019 1 commit
  26. 12 Jul, 2019 1 commit
  27. 09 Jul, 2019 1 commit
    • Yonghong Song's avatar
      [BPF] add new intrinsics preserve_{array,union,struct}_access_index · e3919c6b
      Yonghong Song authored
      For background of BPF CO-RE project, please refer to
        http://vger.kernel.org/bpfconf2019.html
      
      
      In summary, BPF CO-RE intends to compile bpf programs
      adjustable on struct/union layout change so the same
      program can run on multiple kernels with adjustment
      before loading based on native kernel structures.
      
      In order to do this, we need keep track of GEP(getelementptr)
      instruction base and result debuginfo types, so we
      can adjust on the host based on kernel BTF info.
      Capturing such information as an IR optimization is hard
      as various optimization may have tweaked GEP and also
      union is replaced by structure it is impossible to track
      fieldindex for union member accesses.
      
      Three intrinsic functions, preserve_{array,union,struct}_access_index,
      are introducted.
        addr = preserve_array_access_index(base, index, dimension)
        addr = preserve_union_access_index(base, di_index)
        addr = preserve_struct_access_index(base, gep_index, di_index)
      here,
        base: the base pointer for the array/union/struct access.
        index: the last access index for array, the same for IR/DebugInfo layout.
        dimension: the array dimension.
        gep_index: the access index based on IR layout.
        di_index: the access index based on user/debuginfo types.
      
      For example, for the following example,
        $ cat test.c
        struct sk_buff {
           int i;
           int b1:1;
           int b2:2;
           union {
             struct {
               int o1;
               int o2;
             } o;
             struct {
               char flags;
               char dev_id;
             } dev;
             int netid;
           } u[10];
        };
      
        static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
            = (void *) 4;
      
        #define _(x) (__builtin_preserve_access_index(x))
      
        int bpf_prog(struct sk_buff *ctx) {
          char dev_id;
          bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
          return dev_id;
        }
        $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
          test.c >& log
      
      The generated IR looks like below:
      
        ...
        define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
          %2 = alloca %struct.sk_buff*, align 8
          %3 = alloca i8, align 1
          store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
          call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
          call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
          call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
          %4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
          %5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
          %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
               %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
          %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
               [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
          %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
               %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
          %9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
          %10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
               %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
          %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
          %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
          %13 = sext i8 %12 to i32, !dbg !54
          call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
          ret i32 %13, !dbg !57
        }
      
        !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
        !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
        !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)
      
      Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
      attached to instructions to provide struct/union debuginfo type information.
      
      For &ctx->u[5].dev.dev_id,
        . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
        . The "%7 = ..." represents array subscript "5".
        . The "%8 = ..." represents union member "dev" with index 1 for DI layout.
        . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.
      
      Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
      examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
      can be achieved.
      
      The intrinsics also contain enough information to regenerate codes for IR layout.
      For array and structure intrinsics, the proper GEP can be constructed.
      For union intrinsics, replacing all uses of "addr" with "base" should be enough.
      
      The test case ThinLTO/X86/lazyload_metadata.ll is adjusted to reflect the
      new addition of the metadata.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      
      Differential Revision: https://reviews.llvm.org/D61810
      
      llvm-svn: 365423
      e3919c6b
  28. 08 Jul, 2019 2 commits
    • Yonghong Song's avatar
      Revert "[BPF] add new intrinsics preserve_{array,union,struct}_access_index" · 0d566dbb
      Yonghong Song authored
      This reverts commit r365352.
      
      Test ThinLTO/X86/lazyload_metadata.ll failed. Revert the commit
      and at the same time to fix the issue.
      
      llvm-svn: 365360
      0d566dbb
    • Yonghong Song's avatar
      [BPF] add new intrinsics preserve_{array,union,struct}_access_index · 75c2a670
      Yonghong Song authored
      For background of BPF CO-RE project, please refer to
        http://vger.kernel.org/bpfconf2019.html
      
      
      In summary, BPF CO-RE intends to compile bpf programs
      adjustable on struct/union layout change so the same
      program can run on multiple kernels with adjustment
      before loading based on native kernel structures.
      
      In order to do this, we need keep track of GEP(getelementptr)
      instruction base and result debuginfo types, so we
      can adjust on the host based on kernel BTF info.
      Capturing such information as an IR optimization is hard
      as various optimization may have tweaked GEP and also
      union is replaced by structure it is impossible to track
      fieldindex for union member accesses.
      
      Three intrinsic functions, preserve_{array,union,struct}_access_index,
      are introducted.
        addr = preserve_array_access_index(base, index, dimension)
        addr = preserve_union_access_index(base, di_index)
        addr = preserve_struct_access_index(base, gep_index, di_index)
      here,
        base: the base pointer for the array/union/struct access.
        index: the last access index for array, the same for IR/DebugInfo layout.
        dimension: the array dimension.
        gep_index: the access index based on IR layout.
        di_index: the access index based on user/debuginfo types.
      
      For example, for the following example,
        $ cat test.c
        struct sk_buff {
           int i;
           int b1:1;
           int b2:2;
           union {
             struct {
               int o1;
               int o2;
             } o;
             struct {
               char flags;
               char dev_id;
             } dev;
             int netid;
           } u[10];
        };
      
        static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
            = (void *) 4;
      
        #define _(x) (__builtin_preserve_access_index(x))
      
        int bpf_prog(struct sk_buff *ctx) {
          char dev_id;
          bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
          return dev_id;
        }
        $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
          test.c >& log
      
      The generated IR looks like below:
      
        ...
        define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
          %2 = alloca %struct.sk_buff*, align 8
          %3 = alloca i8, align 1
          store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
          call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
          call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
          call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
          %4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
          %5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
          %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
               %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
          %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
               [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
          %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
               %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
          %9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
          %10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
               %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
          %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
          %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
          %13 = sext i8 %12 to i32, !dbg !54
          call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
          ret i32 %13, !dbg !57
        }
      
        !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
        !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
        !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)
      
      Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
      attached to instructions to provide struct/union debuginfo type information.
      
      For &ctx->u[5].dev.dev_id,
        . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
        . The "%7 = ..." represents array subscript "5".
        . The "%8 = ..." represents union member "dev" with index 1 for DI layout.
        . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.
      
      Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
      examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
      can be achieved.
      
      The intrinsics also contain enough information to regenerate codes for IR layout.
      For array and structure intrinsics, the proper GEP can be constructed.
      For union intrinsics, replacing all uses of "addr" with "base" should be enough.
      
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      
      Differential Revision: https://reviews.llvm.org/D61810
      
      llvm-svn: 365352
      75c2a670
  29. 28 Jun, 2019 1 commit
  30. 13 Jun, 2019 1 commit
    • Sander de Smalen's avatar
      Improve reduction intrinsics by overloading result value. · 51c2fa0e
      Sander de Smalen authored
      This patch uses the mechanism from D62995 to strengthen the
      definitions of the reduction intrinsics by letting the scalar
      result/accumulator type be overloaded from the vector element type.
      
      For example:
      
        ; The LLVM LangRef specifies that the scalar result must equal the
        ; vector element type, but this is not checked/enforced by LLVM.
        declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a)
      
      This patch changes that into:
      
        declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a)
      
      Which has the type-constraint more explicit and causes LLVM to check
      the result type with the vector element type.
      
      Reviewers: RKSimon, arsenm, rnk, greened, aemerson
      
      Reviewed By: arsenm
      
      Differential Revision: https://reviews.llvm.org/D62996
      
      llvm-svn: 363240
      51c2fa0e
  31. 11 Jun, 2019 1 commit
    • Sander de Smalen's avatar
      Change semantics of fadd/fmul vector reductions. · cbeb563c
      Sander de Smalen authored
      This patch changes how LLVM handles the accumulator/start value
      in the reduction, by never ignoring it regardless of the presence of
      fast-math flags on callsites. This change introduces the following
      new intrinsics to replace the existing ones:
      
        llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
        llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul
      
      and adds functionality to auto-upgrade existing LLVM IR and bitcode.
      
      Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson
      
      Reviewed By: nikic
      
      Differential Revision: https://reviews.llvm.org/D60261
      
      llvm-svn: 363035
      cbeb563c
  32. 07 Jun, 2019 1 commit
    • Sam Parker's avatar
      [CodeGen] Generic Hardware Loop Support · c5ef502e
      Sam Parker authored
          
      Patch which introduces a target-independent framework for generating
      hardware loops at the IR level. Most of the code has been taken from
      PowerPC CTRLoops and PowerPC has been ported over to use this generic
      pass. The target dependent parts have been moved into
      TargetTransformInfo, via isHardwareLoopProfitable, with
      HardwareLoopInfo introduced to transfer information from the backend.
          
      Three generic intrinsics have been introduced:
      - void @llvm.set_loop_iterations
        Takes as a single operand, the number of iterations to be executed.
      - i1 @llvm.loop_decrement(anyint)
        Takes the maximum number of elements processed in an iteration of
        the loop body and subtracts this from the total count. Returns
        false when the loop should exit.
      - anyint @llvm.loop_decrement_reg(anyint, anyint)
        Takes the number of elements remaining to be processed as well as
        the maximum numbe of elements processed in an iteration of the loop
        body. Returns the updated number of elements remaining.
      
      llvm-svn: 362774
      c5ef502e
  33. 28 May, 2019 1 commit
    • Adhemerval Zanella's avatar
      [CodeGen] Add lrint/llrint builtins · 6d7bf5e8
      Adhemerval Zanella authored
      This patch add the ISD::LRINT and ISD::LLRINT along with new
      intrinsics.  The changes are straightforward as for other
      floating-point rounding functions, with just some adjustments
      required to handle the return value being an interger.
      
      The idea is to optimize lrint/llrint generation for AArch64
      in a subsequent patch.  Current semantic is just route it to libm
      symbol.
      
      Reviewed By: craig.topper
      
      Differential Revision: https://reviews.llvm.org/D62017
      
      llvm-svn: 361875
      6d7bf5e8
  34. 21 May, 2019 1 commit
    • Leonard Chan's avatar
      [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic · 0bada7ce
      Leonard Chan authored
      Add an intrinsic that takes 2 signed integers with the scale of them provided
      as the third argument and performs fixed point multiplication on them. The
      result is saturated and clamped between the largest and smallest representable
      values of the first 2 operands.
      
      This is a part of implementing fixed point arithmetic in clang where some of
      the more complex operations will be implemented as intrinsics.
      
      Differential Revision: https://reviews.llvm.org/D55720
      
      llvm-svn: 361289
      0bada7ce