This project is mirrored from Pull mirroring updated .
  1. 17 Jun, 2020 1 commit
  2. 12 Jun, 2020 1 commit
    • David Green's avatar
      [ARM] Always use reductions intrinsics under MVE · 46529978
      David Green authored
      Similar to a recent change to the X86 backend, this changes things so
      that we always produce a reduction intrinsics for all reduction types,
      not just the legal ones. This gives a better chance in the backend to
      custom lower them to something more suitable for MVE. Especially for
      something like fadd the in-order reduction produced during DAG lowering
      is already better than the shuffles produced in the midend, and we can
      do even better with a bit of custom lowering.
      Differential Revision:
  3. 10 Jun, 2020 1 commit
    • Sam Parker's avatar
      [CostModel] Unify getArithmeticInstrCost · fa8bff0c
      Sam Parker authored
      Add the remaining arithmetic opcodes into the generic implementation
      of getUserCost and then call this from getInstructionThroughput. Most
      of the backends have been modified to return the base implementation
      for cost kinds other RecipThroughput. The outlier here is AMDGPU
      which already uses getArithmeticInstrCost for all the cost kinds.
      This change means that most of the opcodes can be removed from that
      backends implementation of getUserCost.
      Differential Revision:
  4. 09 Jun, 2020 1 commit
    • Sam Parker's avatar
      [NFCI][CostModel] Unify getCmpSelInstrCost · 37289615
      Sam Parker authored
      Add cases for icmp, fcmp and select into the switch statement of the
      generic getUserCost implementation with getInstructionThroughput then
      calling into it. The BasicTTI and backend implementations have be set
      to return a default value (1) when a cost other than throughput is
      being queried.
      Differential Revision:
  5. 08 Jun, 2020 1 commit
  6. 05 Jun, 2020 1 commit
    • Sam Parker's avatar
      [CostModel] Unify getMemoryOpCost · 9303546b
      Sam Parker authored
      Use getMemoryOpCost from the generic implementation of getUserCost
      and have getInstructionThroughput return the result of that for loads
      and stores.
      This also means that the X86 implementation of getUserCost can be
      removed with the functionality folded into its getMemoryOpCost.
      Differential Revision:
  7. 29 May, 2020 1 commit
    • Sjoerd Meijer's avatar
      [TTI] New target hook emitGetActiveLaneMask · 7480ccbf
      Sjoerd Meijer authored
      This is split off from D79100 and adds a new target hook emitGetActiveLaneMask
      that can be queried to check if the intrinsic is
      supported by the backend and if it should be emitted for a given loop.
      See also commit rG7fb8a40e and its commit message for more details/context
      on this new intrinsic.
      Differential Revision:
  8. 26 May, 2020 1 commit
    • Sam Parker's avatar
      [CostModel] Unify getCastInstrCost · 8aaabade
      Sam Parker authored
      Add the remaining cast instruction opcodes to the base implementation
      of getUserCost and directly return the result. This allows
      getInstructionThroughput to return getUserCost for the casts. This
      has required changes to PPC and SystemZ because they implement
      getUserCost and/or getCastInstrCost with adjustments for vector
      operations. Adjusts have also been made in the remaining backends
      that implement the method so that they still produce a cost of zero
      or one for cost kinds other than throughput.
      Differential Revision:
  9. 23 May, 2020 1 commit
    • Craig Topper's avatar
      [Align] Remove operations on MaybeAlign that asserted that it had a defined value. · 7392820f
      Craig Topper authored
      If the caller needs to reponsible for making sure the MaybeAlign
      has a value, then we should just make the caller convert it to an Align
      with operator*.
      I explicitly deleted the relational comparison operators that
      were being inherited from Optional. It's unclear what the meaning
      of two MaybeAligns were one is defined and the other isn't
      should be. So make the caller reponsible for defining the behavior.
      I left the ==/!= operators from Optional. But now that exposed a
      weird quirk that ==/!= between Align and MaybeAlign required the
      MaybeAlign to be defined. But now we use the operator== from
      Optional that takes an Optional and the Value.
      Differential Revision:
  10. 15 May, 2020 1 commit
  11. 13 May, 2020 1 commit
  12. 12 May, 2020 1 commit
    • Sam Parker's avatar
      [ARM][CostModel] Improve getCastInstrCost · b4a8091a
      Sam Parker authored
      - Specifically check for sext/zext users which have 'long' form NEON
      - Add more entries to the table for sext/zexts so that we can report
        more accurately the number of vmovls required for NEON.
      - Pass the instruction to the pass implementation.
      Differential Revision:
  13. 05 May, 2020 2 commits
  14. 04 May, 2020 1 commit
  15. 28 Apr, 2020 1 commit
    • Sam Parker's avatar
      [TTI] Add TargetCostKind argument to getUserCost · e9c9329a
      Sam Parker authored
      There are several different types of cost that TTI tries to provide
      explicit information for: throughput, latency, code size along with
      a vague 'intersection of code-size cost and execution cost'.
      The vectorizer is a keen user of RecipThroughput and there's at least
      'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
      help with this cost. The latency cost has a single use and a single
      implementation. The intersection cost appears to cover most of the
      rest of the API.
      getUserCost is explicitly called from within TTI when the user has
      been explicit in wanting the code size (also only one use) as well
      as a few passes which are concerned with a mixture of size and/or
      a relative cost. In many cases these costs are closely related, such
      as when multiple instructions are required, but one evident diverging
      cost in this function is for div/rem.
      This patch adds an argument so that the cost required is explicit,
      so that we can make the important distinction when necessary.
      Differential Revision:
  16. 22 Apr, 2020 1 commit
  17. 20 Apr, 2020 1 commit
  18. 09 Apr, 2020 1 commit
  19. 11 Mar, 2020 1 commit
    • Anna Welker's avatar
      [TTI][ARM][MVE] Refine gather/scatter cost model · a6d3bec8
      Anna Welker authored
      Refines the gather/scatter cost model, but also changes the TTI
      function getIntrinsicInstrCost to accept an additional parameter
      which is needed for the gather/scatter cost evaluation.
      This did require trivial changes in some non-ARM backends to
      adopt the new parameter.
      Extending gathers and truncating scatters are now priced cheaper.
      Differential Revision:
  20. 03 Feb, 2020 1 commit
  21. 31 Jan, 2020 1 commit
  22. 24 Jan, 2020 1 commit
  23. 22 Jan, 2020 1 commit
    • David Green's avatar
      [ARM] Basic gather scatter cost model · e9c19827
      David Green authored
      This is a very basic MVE gather/scatter cost model, based roughly on the
      code that we will currently produce. It does not handle truncating
      scatters or extending gathers correctly yet, as it is difficult to tell
      that they are going to be correctly extended/truncated from the limited
      information in the cost function.
      This can be improved as we extend support for these in the future.
      Based on code originally written by David Sherwood.
      Differential Revision:
  24. 20 Jan, 2020 1 commit
    • David Green's avatar
      [ARM] Favour post inc for MVE loops · 5e51f755
      David Green authored
      We were previously not necessarily favouring postinc for the MVE loads
      and stores, leading to extra code prior to the loop to set up the
      preinc. MVE in general can benefit from postinc (as we don't have
      unrolled loops), and certain instructions like the VLD2's only post-inc
      versions are available.
      Differential Revision:
  25. 09 Jan, 2020 1 commit
  26. 08 Jan, 2020 1 commit
  27. 12 Dec, 2019 1 commit
  28. 09 Dec, 2019 3 commits
    • David Green's avatar
      [ARM] Enable MVE masked loads and stores · b1aba037
      David Green authored
      With the extra optimisations we have done, these should now be fine to
      enable by default. Which is what this patch does.
      Differential Revision:
    • David Green's avatar
      [ARM] Teach the Arm cost model that a Shift can be folded into other instructions · be7a1070
      David Green authored
      This attempts to teach the cost model in Arm that code such as:
        %s = shl i32 %a, 3
        %a = and i32 %s, %b
      Can under Arm or Thumb2 become:
        and r0, r1, r2, lsl #3
      So the cost of the shift can essentially be free. To do this without
      trying to artificially adjust the cost of the "and" instruction, it
      needs to get the users of the shl and check if they are a type of
      instruction that the shift can be folded into. And so it needs to have
      access to the actual instruction in getArithmeticInstrCost, which if
      available is added as an extra parameter much like getCastInstrCost.
      We otherwise limit it to shifts with a single user, which should
      hopefully handle most of the cases. The list of instruction that the
      shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
      ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
      Differential Revision:
    • David Green's avatar
      [ARM] Additional tests and minor formatting. NFC · f008b5b8
      David Green authored
      This adds some extra cost model tests for shifts, and does some minor
      adjustments to some Neon code to make it clear as to what it applies to.
      Both NFC.
  29. 19 Nov, 2019 1 commit
    • David Green's avatar
      [ARM] MVE interleaving load and stores. · 882f23ca
      David Green authored
      Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
      for MVE. This works the same way as Neon, recognising the load/shuffles
      combination and converting them into intrinsics in a pre-isel pass,
      which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
      and lowerInterleavedStore.
      The main difference to Neon is that we do not have a VLD3 instruction.
      Otherwise most of the code works very similarly, with just some minor
      differences in the form of the intrinsics to work around. VLD3 is
      disabled by making isLegalInterleavedAccessType return false for those
      We may need some other future adjustments, such as VLD4 take up half the
      available registers so should maybe cost more. This patch should get the
      basics in though.
      Differential Revision:
  30. 15 Nov, 2019 1 commit
  31. 13 Nov, 2019 1 commit
    • Sjoerd Meijer's avatar
      [ARM][MVE] canTailPredicateLoop · d90804d2
      Sjoerd Meijer authored
      This implements TTI hook 'preferPredicateOverEpilogue' for MVE.  This is a
      first version and it operates on single block loops only. With this change, the
      vectoriser will now determine if tail-folding scalar remainder loops is
      possible/desired, which is the first step to generate MVE tail-predicated
      vector loops.
      This is disabled by default for now. I.e,, this is depends on option
      -disable-mve-tail-predication, which is off by default.
      I will follow up on this soon with a patch for the vectoriser to respect loop
      hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
      we don't want to tail-fold and we shouldn't query this TTI hook, which is
      done in D70125.
      Differential Revision:
  32. 06 Nov, 2019 1 commit
    • Sjoerd Meijer's avatar
      [TTI][LV] preferPredicateOverEpilogue · 6c2a4f5f
      Sjoerd Meijer authored
      We have two ways to steer creating a predicated vector body over creating a
      scalar epilogue. To force this, we have 1) a command line option and 2) a
      pragma available. This adds a third: a target hook to TargetTransformInfo that
      can be queried whether predication is preferred or not, which allows the
      vectoriser to make the decision without forcing it.
      While this change behaves as a non-functional change for now, it shows the
      required TTI plumbing, usage of this new hook in the vectoriser, and the
      beginning of an ARM MVE implementation. I will follow up on this with:
      - a complete MVE implementation, see D69845.
      - a patch to disable this, i.e. we should respect "vector_predicate(disable)"
        and its corresponding loophint.
      Differential Revision:
  33. 25 Oct, 2019 1 commit
  34. 17 Oct, 2019 1 commit
  35. 14 Oct, 2019 1 commit
  36. 15 Sep, 2019 1 commit
    • David Green's avatar
      [ARM] Masked loads and stores · b325c057
      David Green authored
      Masked loads and store fit naturally with MVE, the instructions being easily
      predicated. This adds lowering for the simple cases of masked loads and stores.
      It does not yet deal with widening/narrowing or pre/post inc, and so is
      currently behind an option.
      The llvm masked load intrinsic will accept a "passthru" value, dictating the
      values used for the zero masked lanes. In MVE the instructions write 0 to the
      zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
      with a select instruction to pull in the correct data after the load.
      Differential Revision:
      llvm-svn: 371932
  37. 13 Sep, 2019 1 commit