This project is mirrored from https://github.com/llvm-doe-org/llvm-project.git. Pull mirroring updated .
  1. 17 Jun, 2020 1 commit
  2. 15 Jun, 2020 1 commit
  3. 06 Jun, 2020 1 commit
  4. 29 May, 2020 1 commit
    • Sjoerd Meijer's avatar
      [TTI] New target hook emitGetActiveLaneMask · 7480ccbf
      Sjoerd Meijer authored
      This is split off from D79100 and adds a new target hook emitGetActiveLaneMask
      that can be queried to check if the intrinsic @llvm.get.active.lane.mask() is
      supported by the backend and if it should be emitted for a given loop.
      
      See also commit rG7fb8a40e and its commit message for more details/context
      on this new intrinsic.
      
      Differential Revision: https://reviews.llvm.org/D80597
      7480ccbf
  5. 28 May, 2020 1 commit
    • Matt Arsenault's avatar
      InferAddressSpaces: Handle ptrmask intrinsic · d6671ee9
      Matt Arsenault authored
      This one is slightly odd since it counts as an address expression,
      which previously could never fail. Allow the existing TTI hook to
      return the value to use, and re-use it for handling how to handle
      ptrmask.
      
      Handles the no-op addrspacecasts for AMDGPU. We could probably do
      something better based on analysis of the mask value based on the
      address space, but leave that for now.
      d6671ee9
  6. 26 May, 2020 1 commit
    • Sam Parker's avatar
      [CostModel] Unify Intrinsic Costs. · 871556a4
      Sam Parker authored
      Recommitting most of the remaining changes from
      259eb619, but excluding the call to
      getUserCost from getInstructionThroughput. Though there's still no
      test changes, I doubt that this is an NFC...
      
      With the two getIntrinsicInstrCosts folded into one, now fold in the
      scalar/code-size orientated getIntrinsicCost. The remaining scalar
      intrinsics were memcpy, cttz and ctlz which now have special handling
      in the BasicTTI implementation.
      
      This had required a change in the AMDGPU backend for fabs as it
      should always be 'free'. I've also changed the X86 backend to return
      the BaseT implementation when the CostKind isn't RecipThroughput.
      
      Differential Revision: https://reviews.llvm.org/D80012
      871556a4
  7. 21 May, 2020 3 commits
    • Sam Parker's avatar
      Revert "[CostModel] Unify Intrinsic Costs." · 259eb619
      Sam Parker authored
      This reverts commit de71def3.
      
      This is causing some very large changes, so I'm first going to break
      this patch down and re-commit in parts.
      259eb619
    • Sam Parker's avatar
      [CostModel] Unify Intrinsic Costs. · de71def3
      Sam Parker authored
      With the two getIntrinsicInstrCosts folded into one, now fold in the
      scalar/code-size orientated getIntrinsicCost. This involved sinking
      cost of the TTIImpl into the base implementation, as it performs no
      target checks. The opcodes remaining were memcpy, cttz and ctlz which
      now have special handling in the BasicTTI implementation.
      getInstructionThroughput can now directly return the result of
      getUserCost.
      
      This had required a change in the AMDGPU backend for fabs and its
      always 'free'. I've also changed the X86 backend to return '1' for
      any intrinsic when the CostKind isn't RecipThroughput.
      
      Though this intended to be a non-functional change, there are many
      paths being combined here so I would be very surprised if this didn't
      have an effect.
      
      Differential Revision: https://reviews.llvm.org/D80012
      de71def3
    • Sam Parker's avatar
      [CostModel] Remove getExtCost · fb3ba380
      Sam Parker authored
      This has not been implemented by any backends which appear to cover
      the functionality through getCastInstrCost. Sink what there is in the
      default implementation into BasicTTI.
      
      Differential Revision: https://reviews.llvm.org/D78922
      fb3ba380
  8. 20 May, 2020 1 commit
    • Sam Parker's avatar
      [NFCI][CostModel] Refactor getIntrinsicInstrCost · 8cc911fa
      Sam Parker authored
      Combine the two API calls into one by introducing a structure to hold
      the relevant data. This has the added benefit of moving the boiler
      plate code for arguments and flags, into the constructors. This is
      intended to be a non-functional change, but the complicated web of
      logic involved here makes it very hard to guarantee.
      
      Differential Revision: https://reviews.llvm.org/D79941
      8cc911fa
  9. 19 May, 2020 1 commit
  10. 13 May, 2020 1 commit
  11. 05 May, 2020 2 commits
  12. 29 Apr, 2020 1 commit
    • Simon Pilgrim's avatar
      [TTI] Add DemandedElts to getScalarizationOverhead · 090cae84
      Simon Pilgrim authored
      The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.
      
      This patch does 2 things:
      1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
      2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.
      
      This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.
      
      A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.
      
      Reviewed By: @craig.topper
      
      Differential Revision: https://reviews.llvm.org/D78216
      090cae84
  13. 28 Apr, 2020 1 commit
    • Sam Parker's avatar
      [TTI] Add TargetCostKind argument to getUserCost · e9c9329a
      Sam Parker authored
      There are several different types of cost that TTI tries to provide
      explicit information for: throughput, latency, code size along with
      a vague 'intersection of code-size cost and execution cost'.
      
      The vectorizer is a keen user of RecipThroughput and there's at least
      'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
      help with this cost. The latency cost has a single use and a single
      implementation. The intersection cost appears to cover most of the
      rest of the API.
      
      getUserCost is explicitly called from within TTI when the user has
      been explicit in wanting the code size (also only one use) as well
      as a few passes which are concerned with a mixture of size and/or
      a relative cost. In many cases these costs are closely related, such
      as when multiple instructions are required, but one evident diverging
      cost in this function is for div/rem.
      
      This patch adds an argument so that the cost required is explicit,
      so that we can make the important distinction when necessary.
      
      Differential Revision: https://reviews.llvm.org/D78635
      e9c9329a
  14. 21 Apr, 2020 1 commit
    • Sam Parker's avatar
      [TTI] Remove getOperationCost · ee959ddc
      Sam Parker authored
      This API call has been used recently with, a very valid, expectation
      that it would do something useful but it doesn't actually query any
      backend information. So, remove this method and merge its
      functionality into getUserCost. As well as that, also use
      getCastInstrCost to get a proper cost from the backend for the
      concerned instructions though we only currently return the answer if
      it's considered free. The default implementation now also checks
      int/ptr conversions too, as well as truncs and bitcasts.
      
      Differential Revision: https://reviews.llvm.org/D76124
      ee959ddc
  15. 20 Apr, 2020 1 commit
  16. 19 Apr, 2020 1 commit
    • Florian Hahn's avatar
      [TTI] Clean up includes (NFC). · a7aaadc1
      Florian Hahn authored
      Remove some unnecessary includes, replace some with forward
      declarations.
      
      This also exposed a few places that were missing some includes.
      a7aaadc1
  17. 15 Apr, 2020 1 commit
  18. 02 Apr, 2020 2 commits
    • Jonas Paulsson's avatar
    • Jonas Paulsson's avatar
      [LoopDataPrefetch + SystemZ] Let target decide on prefetching for each loop. · 36d4421f
      Jonas Paulsson authored
      This patch adds
      
      - New arguments to getMinPrefetchStride() to let the target decide on a
        per-loop basis if software prefetching should be done even with a stride
        within the limit of the hw prefetcher.
      
      - New TTI hook enableWritePrefetching() to let a target do write prefetching
        by default (defaults to false).
      
      - In LoopDataPrefetch:
      
        - A search through the whole loop to gather information before emitting any
          prefetches. This way the target can get information via new arguments to
          getMinPrefetchStride() and emit prefetches more selectively. Collected
          information includes: Does the loop have a call, how many memory
          accesses, how many of them are strided, how many prefetches will cover
          them. This is NFC to before as long as the target does not change its
          definition of getMinPrefetchStride().
      
        - If a previous access to the same exact address was 'read', and the
          current one is 'write', make it a 'write' prefetch.
      
        - If two accesses that are covered by the same prefetch do not dominate
          each other, put the prefetch in a block that dominates both of them.
      
        - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.
      
      - A SystemZ implementation of getMinPrefetchStride().
      
      Review: Ulrich Weigand, Michael Kruse
      
      Differential Revision: https://reviews.llvm.org/D70228
      36d4421f
  19. 01 Apr, 2020 1 commit
  20. 19 Mar, 2020 1 commit
  21. 16 Mar, 2020 1 commit
  22. 11 Mar, 2020 1 commit
    • Anna Welker's avatar
      [TTI][ARM][MVE] Refine gather/scatter cost model · a6d3bec8
      Anna Welker authored
      Refines the gather/scatter cost model, but also changes the TTI
      function getIntrinsicInstrCost to accept an additional parameter
      which is needed for the gather/scatter cost evaluation.
      This did require trivial changes in some non-ARM backends to
      adopt the new parameter.
      Extending gathers and truncating scatters are now priced cheaper.
      
      Differential Revision: https://reviews.llvm.org/D75525
      a6d3bec8
  23. 02 Mar, 2020 1 commit
    • Arkady Shlykov's avatar
      [Loop Peeling] Add possibility to enable peeling on loop nests. · 3dcaf296
      Arkady Shlykov authored
      Summary:
      Current peeling implementation bails out in case of loop nests.
      The patch introduces a field in TargetTransformInfo structure that
      certain targets can use to relax the constraints if it's
      profitable (disabled by default).
      Also additional option is added to enable peeling manually for
      experimenting and testing purposes.
      
      Reviewers: fhahn, lebedev.ri, xbolva00
      
      Reviewed By: xbolva00
      
      Subscribers: RKSimon, xbolva00, hiraditya, zzheng, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D70304
      3dcaf296
  24. 24 Jan, 2020 1 commit
  25. 22 Jan, 2020 2 commits
  26. 16 Jan, 2020 1 commit
  27. 15 Jan, 2020 1 commit
    • Arkady Shlykov's avatar
      [Loop Peeling] Add possibility to enable peeling on loop nests. · 3f3017e1
      Arkady Shlykov authored
      Summary:
      Current peeling implementation bails out in case of loop nests.
      The patch introduces a field in TargetTransformInfo structure that
      certain targets can use to relax the constraints if it's
      profitable (disabled by default).
      Also additional option is added to enable peeling manually for
      experimenting and testing purposes.
      
      Reviewers: fhahn, lebedev.ri, xbolva00
      
      Reviewed By: xbolva00
      
      Subscribers: xbolva00, hiraditya, zzheng, llvm-commits
      
      Differential Revision: https://reviews.llvm.org/D70304
      3f3017e1
  28. 18 Dec, 2019 1 commit
  29. 12 Dec, 2019 2 commits
    • Reid Kleckner's avatar
      [IR] Split out target specific intrinsic enums into separate headers · 5d986953
      Reid Kleckner authored
      This has two main effects:
      - Optimizes debug info size by saving 221.86 MB of obj file size in a
        Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
        object file size.
      - Incremental step towards decoupling target intrinsics.
      
      The enums are still compact, so adding and removing a single
      target-specific intrinsic will trigger a rebuild of all of LLVM.
      Assigning distinct target id spaces is potential future work.
      
      Part of PR34259
      
      Reviewers: efriedma, echristo, MaskRay
      
      Reviewed By: echristo, MaskRay
      
      Differential Revision: https://reviews.llvm.org/D71320
      5d986953
    • Reid Kleckner's avatar
      Rename TTI::getIntImmCost for instructions and intrinsics · 85ba5f63
      Reid Kleckner authored
      Soon Intrinsic::ID will be a plain integer, so this overload will not be
      possible.
      
      Rename both overloads to ensure that downstream targets observe this as
      a build failure instead of a runtime failure.
      
      Split off from D71320
      
      Reviewers: efriedma
      
      Differential Revision: https://reviews.llvm.org/D71381
      85ba5f63
  30. 09 Dec, 2019 1 commit
    • David Green's avatar
      [ARM] Teach the Arm cost model that a Shift can be folded into other instructions · be7a1070
      David Green authored
      This attempts to teach the cost model in Arm that code such as:
        %s = shl i32 %a, 3
        %a = and i32 %s, %b
      Can under Arm or Thumb2 become:
        and r0, r1, r2, lsl #3
      
      So the cost of the shift can essentially be free. To do this without
      trying to artificially adjust the cost of the "and" instruction, it
      needs to get the users of the shl and check if they are a type of
      instruction that the shift can be folded into. And so it needs to have
      access to the actual instruction in getArithmeticInstrCost, which if
      available is added as an extra parameter much like getCastInstrCost.
      
      We otherwise limit it to shifts with a single user, which should
      hopefully handle most of the cases. The list of instruction that the
      shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
      ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
      ICmp.
      
      Differential Revision: https://reviews.llvm.org/D70966
      be7a1070
  31. 06 Nov, 2019 1 commit
    • Sjoerd Meijer's avatar
      [TTI][LV] preferPredicateOverEpilogue · 6c2a4f5f
      Sjoerd Meijer authored
      We have two ways to steer creating a predicated vector body over creating a
      scalar epilogue. To force this, we have 1) a command line option and 2) a
      pragma available. This adds a third: a target hook to TargetTransformInfo that
      can be queried whether predication is preferred or not, which allows the
      vectoriser to make the decision without forcing it.
      
      While this change behaves as a non-functional change for now, it shows the
      required TTI plumbing, usage of this new hook in the vectoriser, and the
      beginning of an ARM MVE implementation. I will follow up on this with:
      - a complete MVE implementation, see D69845.
      - a patch to disable this, i.e. we should respect "vector_predicate(disable)"
        and its corresponding loophint.
      
      Differential Revision: https://reviews.llvm.org/D69040
      6c2a4f5f
  32. 31 Oct, 2019 1 commit
  33. 25 Oct, 2019 1 commit
  34. 14 Oct, 2019 1 commit