This project is mirrored from Pull mirroring updated .
  1. 24 Jun, 2020 2 commits
  2. 18 Jun, 2020 1 commit
  3. 17 Jun, 2020 1 commit
  4. 16 Jun, 2020 1 commit
    • Christopher Tetreault's avatar
      [SVE] Remove invalid calls to VectorType::getNumElements from BasicTTIImpl · b3e77c6d
      Christopher Tetreault authored
      Most of these operations are reasonable for scalable vectors. Due to
      this, we have decided not to change the interface to specifically take
      FixedVectorType despite the fact that the current implementations make
      fixed width assumptions. Instead, we cast to FixedVectorType and assert
      in the body. If a developer makes some change in the future that causes
      one of these asserts to fire, they should either change their code or
      make the function they are trying to call handle scalable vectors.
      Reviewers: efriedma, samparker, RKSimon, craig.topper, sdesmalen, c-rhodes
      Reviewed By: efriedma
      Subscribers: tschuett, rkruppe, psnobl, llvm-commits
      Tags: #llvm
      Differential Revision:
  5. 15 Jun, 2020 1 commit
    • Sam Parker's avatar
      [CostModel] getCFInstrCost in getUserCost. · 2596da31
      Sam Parker authored
      Have BasicTTI call the base implementation so that both agree on the
      default behaviour, which the default being a cost of '1'. This has
      required an X86 specific implementation as it seems to be very
      reliant on those instructions being free. Changes are also made to
      AMDGPU so that their implementations distinguish between cost kinds,
      so that the unrolling isn't affected. PowerPC also has its own
      implementation to prevent changes to the reg-usage vectorizer test.
      The cost model test changes now reflect that ret instructions are not
      generally free.
      Differential Revision:
  6. 10 Jun, 2020 1 commit
    • Sam Parker's avatar
      [CostModel] Unify getArithmeticInstrCost · fa8bff0c
      Sam Parker authored
      Add the remaining arithmetic opcodes into the generic implementation
      of getUserCost and then call this from getInstructionThroughput. Most
      of the backends have been modified to return the base implementation
      for cost kinds other RecipThroughput. The outlier here is AMDGPU
      which already uses getArithmeticInstrCost for all the cost kinds.
      This change means that most of the opcodes can be removed from that
      backends implementation of getUserCost.
      Differential Revision:
  7. 09 Jun, 2020 1 commit
    • Sam Parker's avatar
      [NFCI][CostModel] Unify getCmpSelInstrCost · 37289615
      Sam Parker authored
      Add cases for icmp, fcmp and select into the switch statement of the
      generic getUserCost implementation with getInstructionThroughput then
      calling into it. The BasicTTI and backend implementations have be set
      to return a default value (1) when a cost other than throughput is
      being queried.
      Differential Revision:
  8. 08 Jun, 2020 1 commit
    • Sam Parker's avatar
      [PPC] Try to fix builbots · 772349de
      Sam Parker authored
      Attempt to handle unsupported types, such as structs, in
      getMemoryOpCost. The backend now checks for a supported type and
      calls into BasicTTI as a fallback. BasicTTI will now also perform
      the same check and will default to an expensive cost of 4 for 'Other'
      Differential Revision:
  9. 05 Jun, 2020 1 commit
    • Sam Parker's avatar
      [CostModel] Unify getMemoryOpCost · 9303546b
      Sam Parker authored
      Use getMemoryOpCost from the generic implementation of getUserCost
      and have getInstructionThroughput return the result of that for loads
      and stores.
      This also means that the X86 implementation of getUserCost can be
      removed with the functionality folded into its getMemoryOpCost.
      Differential Revision:
  10. 29 May, 2020 1 commit
    • Sjoerd Meijer's avatar
      [TTI] New target hook emitGetActiveLaneMask · 7480ccbf
      Sjoerd Meijer authored
      This is split off from D79100 and adds a new target hook emitGetActiveLaneMask
      that can be queried to check if the intrinsic is
      supported by the backend and if it should be emitted for a given loop.
      See also commit rG7fb8a40e and its commit message for more details/context
      on this new intrinsic.
      Differential Revision:
  11. 28 May, 2020 1 commit
    • Matt Arsenault's avatar
      InferAddressSpaces: Handle ptrmask intrinsic · d6671ee9
      Matt Arsenault authored
      This one is slightly odd since it counts as an address expression,
      which previously could never fail. Allow the existing TTI hook to
      return the value to use, and re-use it for handling how to handle
      Handles the no-op addrspacecasts for AMDGPU. We could probably do
      something better based on analysis of the mask value based on the
      address space, but leave that for now.
  12. 26 May, 2020 3 commits
    • Serge Pavlov's avatar
      [FPEnv] Intrinsic llvm.roundeven · 4d20e31f
      Serge Pavlov authored
      This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
      and performs rounding to the nearest integer value, rounding halfway
      cases to even. The intrinsic represents the missed case of IEEE-754
      rounding operations and now llvm provides full support of the rounding
      operations defined by the standard.
      Differential Revision:
    • Sam Parker's avatar
      [CostModel] Unify Intrinsic Costs. · 871556a4
      Sam Parker authored
      Recommitting most of the remaining changes from
      259eb619, but excluding the call to
      getUserCost from getInstructionThroughput. Though there's still no
      test changes, I doubt that this is an NFC...
      With the two getIntrinsicInstrCosts folded into one, now fold in the
      scalar/code-size orientated getIntrinsicCost. The remaining scalar
      intrinsics were memcpy, cttz and ctlz which now have special handling
      in the BasicTTI implementation.
      This had required a change in the AMDGPU backend for fabs as it
      should always be 'free'. I've also changed the X86 backend to return
      the BaseT implementation when the CostKind isn't RecipThroughput.
      Differential Revision:
    • Sam Parker's avatar
      [CostModel] Check for free intrinsics in BasicTTI · 1f72d588
      Sam Parker authored
      Recommitting part of "[CostModel] Unify Intrinsic Costs."
      Now that the 'free' intrinsic information has been sunk to the lowest
      level, query the base implementation in BasicTTI before doing
      anything else. I suspect this is the change that was causing the main
      changes, particularly the large effects on debug builds.
      Differential Revision:
  13. 21 May, 2020 3 commits
    • Sam Parker's avatar
      Revert "[CostModel] Unify Intrinsic Costs." · 259eb619
      Sam Parker authored
      This reverts commit de71def3.
      This is causing some very large changes, so I'm first going to break
      this patch down and re-commit in parts.
    • Sam Parker's avatar
      [CostModel] Unify Intrinsic Costs. · de71def3
      Sam Parker authored
      With the two getIntrinsicInstrCosts folded into one, now fold in the
      scalar/code-size orientated getIntrinsicCost. This involved sinking
      cost of the TTIImpl into the base implementation, as it performs no
      target checks. The opcodes remaining were memcpy, cttz and ctlz which
      now have special handling in the BasicTTI implementation.
      getInstructionThroughput can now directly return the result of
      This had required a change in the AMDGPU backend for fabs and its
      always 'free'. I've also changed the X86 backend to return '1' for
      any intrinsic when the CostKind isn't RecipThroughput.
      Though this intended to be a non-functional change, there are many
      paths being combined here so I would be very surprised if this didn't
      have an effect.
      Differential Revision:
    • Sam Parker's avatar
      [CostModel] Remove getExtCost · fb3ba380
      Sam Parker authored
      This has not been implemented by any backends which appear to cover
      the functionality through getCastInstrCost. Sink what there is in the
      default implementation into BasicTTI.
      Differential Revision:
  14. 20 May, 2020 1 commit
    • Sam Parker's avatar
      [NFCI][CostModel] Refactor getIntrinsicInstrCost · 8cc911fa
      Sam Parker authored
      Combine the two API calls into one by introducing a structure to hold
      the relevant data. This has the added benefit of moving the boiler
      plate code for arguments and flags, into the constructors. This is
      intended to be a non-functional change, but the complicated web of
      logic involved here makes it very hard to guarantee.
      Differential Revision:
  15. 13 May, 2020 2 commits
  16. 05 May, 2020 2 commits
  17. 02 May, 2020 1 commit
  18. 29 Apr, 2020 1 commit
    • Simon Pilgrim's avatar
      [TTI] Add DemandedElts to getScalarizationOverhead · 090cae84
      Simon Pilgrim authored
      The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.
      This patch does 2 things:
      1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
      2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.
      This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.
      A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.
      Reviewed By: @craig.topper
      Differential Revision:
  19. 27 Apr, 2020 1 commit
  20. 24 Apr, 2020 2 commits
  21. 21 Apr, 2020 1 commit
    • Sam Parker's avatar
      [TTI] Remove getOperationCost · ee959ddc
      Sam Parker authored
      This API call has been used recently with, a very valid, expectation
      that it would do something useful but it doesn't actually query any
      backend information. So, remove this method and merge its
      functionality into getUserCost. As well as that, also use
      getCastInstrCost to get a proper cost from the backend for the
      concerned instructions though we only currently return the answer if
      it's considered free. The default implementation now also checks
      int/ptr conversions too, as well as truncs and bitcasts.
      Differential Revision:
  22. 20 Apr, 2020 1 commit
  23. 16 Apr, 2020 1 commit
  24. 13 Apr, 2020 1 commit
  25. 10 Apr, 2020 1 commit
  26. 02 Apr, 2020 1 commit
    • Jonas Paulsson's avatar
      [LoopDataPrefetch + SystemZ] Let target decide on prefetching for each loop. · 36d4421f
      Jonas Paulsson authored
      This patch adds
      - New arguments to getMinPrefetchStride() to let the target decide on a
        per-loop basis if software prefetching should be done even with a stride
        within the limit of the hw prefetcher.
      - New TTI hook enableWritePrefetching() to let a target do write prefetching
        by default (defaults to false).
      - In LoopDataPrefetch:
        - A search through the whole loop to gather information before emitting any
          prefetches. This way the target can get information via new arguments to
          getMinPrefetchStride() and emit prefetches more selectively. Collected
          information includes: Does the loop have a call, how many memory
          accesses, how many of them are strided, how many prefetches will cover
          them. This is NFC to before as long as the target does not change its
          definition of getMinPrefetchStride().
        - If a previous access to the same exact address was 'read', and the
          current one is 'write', make it a 'write' prefetch.
        - If two accesses that are covered by the same prefetch do not dominate
          each other, put the prefetch in a block that dominates both of them.
        - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.
      - A SystemZ implementation of getMinPrefetchStride().
      Review: Ulrich Weigand, Michael Kruse
      Differential Revision:
  27. 30 Mar, 2020 1 commit
  28. 16 Mar, 2020 1 commit
  29. 11 Mar, 2020 1 commit
    • Anna Welker's avatar
      [TTI][ARM][MVE] Refine gather/scatter cost model · a6d3bec8
      Anna Welker authored
      Refines the gather/scatter cost model, but also changes the TTI
      function getIntrinsicInstrCost to accept an additional parameter
      which is needed for the gather/scatter cost evaluation.
      This did require trivial changes in some non-ARM backends to
      adopt the new parameter.
      Extending gathers and truncating scatters are now priced cheaper.
      Differential Revision:
  30. 14 Feb, 2020 1 commit
  31. 28 Jan, 2020 1 commit
  32. 24 Jan, 2020 1 commit