This project is mirrored from Pull mirroring updated .
  1. 24 Jun, 2020 3 commits
  2. 23 Jun, 2020 2 commits
    • Eli Friedman's avatar
      [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem · e9d4e34a
      Eli Friedman authored
      Implement them on top of sdiv/udiv, similar to what we do for integer
      Potential future work: implementing i8/i16 srem/urem, optimizations for
      constant divisors, optimizing the mul+sub to mls.
      Differential Revision:
    • Michael Liao's avatar
      [SDAG] Add new AssertAlign ISD node. · b1360caa
      Michael Liao authored
      - AssertAlign node records the guaranteed alignment on its source node,
        where these alignments are retrieved from alignment attributes in LLVM
        IR. These tracked alignments could help DAG combining and lowering
        generating efficient code.
      - In this patch, the basic support of AssertAlign node is added. So far,
        we only generate AssertAlign nodes on return values from intrinsic
      - Addressing selection in AMDGPU is revised accordingly to capture the
        new (base + offset) patterns.
      Reviewers: arsenm, bogner
      Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
      Tags: #llvm
      Differential Revision:
  3. 22 Jun, 2020 2 commits
  4. 21 Jun, 2020 1 commit
    • David Green's avatar
      [CGP] Convert phi types · 730ecb63
      David Green authored
      If a collection of interconnected phi nodes is only ever loaded, stored
      or bitcast then we can convert the whole set to the bitcast type,
      potentially helping to reduce the number of register moves needed as the
      phi's are passed across basic block boundaries. This has to be done in
      CodegenPrepare as it naturally straddles basic blocks.
      The alorithm just looks from phi nodes, looking at uses and operands for
      a collection of nodes that all together are bitcast between float and
      integer types. We record visited phi nodes to not have to process them
      more than once. The whole subgraph is then replaced with a new type.
      Loads and Stores are bitcast to the correct type, which should then be
      folded into the load/store, changing it's type.
      This comes up in the biquad testcase due to the way MVE needs to keep
      values in integer registers. I have also seen it come up from aarch64
      partner example code, where a complicated set of sroa/inlining produced
      integer phis, where float would have been a better choice.
      I also added undef and extract element handling which increased the
      potency in some cases.
      This adds it with an option that defaults to off, and disabled for 32bit
      X86 due to potential issues around canonicalizing NaNs.
      Differential Revision:
  5. 19 Jun, 2020 1 commit
    • David Sherwood's avatar
      [SVE] Fall back on DAG ISel at -O0 when encountering scalable types · 584d0d5c
      David Sherwood authored
      At the moment we use Global ISel by default at -O0, however it is
      currently not capable of dealing with scalable vectors for two
      1. The register banks know nothing about SVE registers.
      2. The LLT (Low Level Type) class knows nothing about scalable
      For now, the easiest way to avoid users hitting issues when using
      the SVE ACLE is to fall back on normal DAG ISel when encountering
      instructions that operate on scalable vector types.
      I've added a couple of RUN lines to existing SVE tests to ensure
      we can compile at -O0. I've also added some new tests to
      that demonstrate we correctly fallback to DAG ISel at -O0 when
      lowering formal arguments or translating instructions that involve
      scalable vector types.
      Differential Revision:
  6. 18 Jun, 2020 8 commits
    • Matt Arsenault's avatar
      GlobalISel: Fix some artifact combiner worklist inconsistencies · 2ec1267e
      Matt Arsenault authored
      In one case, UpdateDefs was not getting set and a dead SmallVector
      constructed. In another, it was adding new vreg defs to the updated
      set which should be unnecessary. This also wasn't considering the
      multiple defs of G_UNMERGE_VALUES.
      Also increase the small vector sizes for merge/unmerge operands to the
      usual semi-arbitrary 8. While debugging these, I'm usually seeing
      merges and unmerges with at least 4 uses/defs.
      I haven't run into an actual problem from any of these though.
    • Matt Arsenault's avatar
      GlobalISel: Pass LegalizerHelper to custom legalize callbacks · 7f8b2e1b
      Matt Arsenault authored
      This was passing in all the parameters needed to construct a
      LegalizerHelper in the custom legalization, when it's simpler to just
      pass in the existing helper.
      This is slightly more annoying to use in the common case where you
      don't need the legalizer helper, but we could add back the common
      parameters back in addition to the helper.
      I didn't propagate this to all the internal target changes that this
      logically implies, but did update a sample one for
      This is in preparation for moving AMDGPU load/store legalization
      entirely into custom lowering. The current set of legalization actions
      is really constraining and not really capable of expressing all the
      actions needed to legalize loads/stores. In particular there's no way
      to express when the memory access itself needs to change size vs. the
      result type. There's also a lot of redundancy since the same
      split/widen actions need to be applied in both vector and scalar
      cases. All of the sub-cases logically belong as steps in the legalizer
      helper, but it will be easier to consider everything at once in custom
    • Michael Liao's avatar
      [TTI] Expose isNoopAddrSpaceCast in TTI. · 2defe557
      Michael Liao authored
      Reviewers: arsenm
      Subscribers: wdng, hiraditya, llvm-commits
      Tags: #llvm
      Differential Revision:
    • Lucas Prates's avatar
      [ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend · a255931c
      Lucas Prates authored
      Half-precision floating point arguments and returns are currently
      promoted to either float or int32 in clang's CodeGen and there's
      no existing support for the lowering of `half` arguments and returns
      from IR in AArch32's backend.
      Such frontend coercions, implemented as coercion through memory
      in clang, can cause a series of issues in argument lowering, as causing
      arguments to be stored on the wrong bits on big-endian architectures
      and incurring in missing overflow detections in the return of certain
      This patch introduces the handling of half-precision arguments and returns in
      the backend using the actual "half" type on the IR. Using the "half"
      type the backend is able to properly enforce the AAPCS' directions for
      those arguments, making sure they are stored on the proper bits of the
      registers and performing the necessary floating point convertions.
      Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer
      Reviewed By: ostannard
      Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits
      Tags: #clang, #llvm
      Differential Revision:
    • David Sherwood's avatar
      [CodeGen] Fix warnings in isPow2VectorType and getPow2VectorType · ae22e841
      David Sherwood authored
      We should either call getVectorMinNumElements() or
      Differential Revision:
    • David Sherwood's avatar
      [CodeGen] Fix warnings in getVectorElementCount() · 3ffb44b4
      David Sherwood authored
      In EVT::getVectorElementCount() when the type is not simple we
      should return getExtendedVectorElementCount() from the function
      instead of constructing the ElementCount object manually.
      I discovered this warning in an existing test:
      Differential Revision:
    • Kristof Beyls's avatar
      [IndirectThunks] Tiny comment fix · f7455da2
      Kristof Beyls authored
    • Kristof Beyls's avatar
      [IndirectThunks] Make generated MF structure as expected by all instruction selectors. · 832cfc76
      Kristof Beyls authored
      This also enables running the AArch64 SLSHardening pass with GlobalISel,
      so add a test for that.
      Differential Revision:
  7. 17 Jun, 2020 2 commits
    • Ian Levesque's avatar
      [xray] Option to omit the function index · 7c7c8e0d
      Ian Levesque authored
      Add a flag to omit the xray_fn_idx to cut size overhead and relocations
      roughly in half at the cost of reduced performance for single function
      patching.  Minor additions to compiler-rt support per-function patching
      without the index.
      Reviewers: dberris, MaskRay, johnislarry
      Subscribers: hiraditya, arphaman, cfe-commits, #sanitizers, llvm-commits
      Tags: #clang, #sanitizers, #llvm
      Differential Revision:
    • Sjoerd Meijer's avatar
      [TTI] Refactor emitGetActiveLaneMask · 20835cff
      Sjoerd Meijer authored
      Refactor TTI hook emitGetActiveLaneMask and remove the unused arguments
      as suggested in D79100.
  8. 16 Jun, 2020 3 commits
    • Christopher Tetreault's avatar
      [SVE] Remove invalid calls to VectorType::getNumElements from BasicTTIImpl · b3e77c6d
      Christopher Tetreault authored
      Most of these operations are reasonable for scalable vectors. Due to
      this, we have decided not to change the interface to specifically take
      FixedVectorType despite the fact that the current implementations make
      fixed width assumptions. Instead, we cast to FixedVectorType and assert
      in the body. If a developer makes some change in the future that causes
      one of these asserts to fire, they should either change their code or
      make the function they are trying to call handle scalable vectors.
      Reviewers: efriedma, samparker, RKSimon, craig.topper, sdesmalen, c-rhodes
      Reviewed By: efriedma
      Subscribers: tschuett, rkruppe, psnobl, llvm-commits
      Tags: #llvm
      Differential Revision:
    • Matt Arsenault's avatar
      GlobalISel: Make special case handling clearer · 91bec1d3
      Matt Arsenault authored
      The special case here is really G_UNMERGE_VALUES, not G_EXTRACT. The
      other opcodes can hardcode index 1 like G_EXTRACT.
    • Matt Arsenault's avatar
      GlobalISel: Use Register · d98a7c3c
      Matt Arsenault authored
  9. 15 Jun, 2020 3 commits
    • Jessica Paquette's avatar
      [GlobalISel] Simplify G_ADD when it has (0-X) on the LHS or RHS · 1ac8451a
      Jessica Paquette authored
      This implements the following combines:
      ((0-A) + B) -> B-A
      (A + (0-B)) -> A-B
      Porting over the basic algebraic combines from the DAGCombiner. There are
      several combines which fold adds away into subtracts. This is just the simplest
      I noticed that add combines are some of the most commonly hit across CTMark,
      (via print statements when they fire), so I'm porting over some of the obvious
      This gives some minor code size improvements on CTMark at -O3 on AArch64.
      Differential Revision:
    • Sam Parker's avatar
      [CostModel] getCFInstrCost in getUserCost. · 2596da31
      Sam Parker authored
      Have BasicTTI call the base implementation so that both agree on the
      default behaviour, which the default being a cost of '1'. This has
      required an X86 specific implementation as it seems to be very
      reliant on those instructions being free. Changes are also made to
      AMDGPU so that their implementations distinguish between cost kinds,
      so that the unrolling isn't affected. PowerPC also has its own
      implementation to prevent changes to the reg-usage vectorizer test.
      The cost model test changes now reflect that ret instructions are not
      generally free.
      Differential Revision:
    • Chen Zheng's avatar
      [PowerPC] fma chain break to expose more ILP · bd7096b9
      Chen Zheng authored
      This patch tries to reassociate two patterns related to FMA to expose
      more ILP on PowerPC.
      // Pattern 1:
      //   A =  FADD X,  Y          (Leaf)
      //   B =  FMA  A,  M21,  M22  (Prev)
      //   C =  FMA  B,  M31,  M32  (Root)
      // -->
      //   A =  FMA  X,  M21,  M22
      //   B =  FMA  Y,  M31,  M32
      //   C =  FADD A,  B
      // Pattern 2:
      //   A =  FMA  X,  M11,  M12  (Leaf)
      //   B =  FMA  A,  M21,  M22  (Prev)
      //   C =  FMA  B,  M31,  M32  (Root)
      // -->
      //   A =  FMUL M11,  M12
      //   B =  FMA  X,  M21,  M22
      //   D =  FMA  A,  M31,  M32
      //   C =  FADD B,  D
      Reviewed By: jsji
      Differential Revision:
  10. 14 Jun, 2020 1 commit
  11. 12 Jun, 2020 1 commit
  12. 11 Jun, 2020 1 commit
  13. 10 Jun, 2020 3 commits
    • Matt Arsenault's avatar
      GlobalISel: Move LegalizerHelper members around · 601b8a0d
      Matt Arsenault authored
      MIRBuilder was in the middle of of a bunch of methods and not group
      with the other member variables, which made it harder to see what
      state this carries around. Move these to the top as is the usual
    • Matt Arsenault's avatar
      GlobalISel: Make default implementation of legalizeCustom unreachable · 0f2af15c
      Matt Arsenault authored
      If the target explicitly requested custom legalization, it should be
      required to implement this. Also move default legalizeIntrinsic
      implementation into the header so it's next to the related
    • Sam Parker's avatar
      [CostModel] Unify getArithmeticInstrCost · fa8bff0c
      Sam Parker authored
      Add the remaining arithmetic opcodes into the generic implementation
      of getUserCost and then call this from getInstructionThroughput. Most
      of the backends have been modified to return the base implementation
      for cost kinds other RecipThroughput. The outlier here is AMDGPU
      which already uses getArithmeticInstrCost for all the cost kinds.
      This change means that most of the opcodes can be removed from that
      backends implementation of getUserCost.
      Differential Revision:
  14. 09 Jun, 2020 9 commits
    • diggerlin's avatar
      [AIX] supporting the visibility attribute for aix assembly · edd819c7
      diggerlin authored
      in the aix assembly , it do not have .hidden and .protected directive.
      in current llvm. if a function or a variable which has visibility attribute, it will generate something like the .hidden or .protected , it can not recognize by aix as.
      in aix assembly, the visibility attribute are support in the pseudo-op like
      .extern Name [ , Visibility ]
      .globl Name [, Visibility ]
      .weak Name [, Visibility ]
      in this patch, we implement the visibility attribute for the global variable, function or extern function .
      for example.
      extern __attribute__ ((visibility ("hidden"))) int
        bar(int* ip);
      __attribute__ ((visibility ("hidden"))) int b = 0;
      __attribute__ ((visibility ("hidden"))) int
        foo(int* ip){
         return (*ip)++;
      the visibility of .comm linkage do not support , we will have a separate patch for it.
      we have the unsupported cases ("default" and "internal") , we will implement them in a a separate patch for it.
      Reviewers: Jason Liu ,hubert.reinterpretcast,James Henderson
      Differential Revision:
    • Matt Arsenault's avatar
      GlobalISel: Improve MachineIRBuilder construction · b94c9e3b
      Matt Arsenault authored
      The current relationship between LegalizerHelper and MachineIRBuilder
      confuses me, because the LegalizerHelper modifies the MachineIRBuilder
      which it does not own. Constructing a LegalizerHelper destroys the
      insert point, since the constructor calls setMF, which clears all the
      fields. Try to separate these functions, so it's possible to construct
      a LegalizerHelper from an existing MachineIRBuilder without losing the
      insert point/debug loc.
    • Matt Arsenault's avatar
      GlobalISel: Move some trivial MIRBuilder methods into the header · babbf444
      Matt Arsenault authored
      The construction APIs for MachineIRBuilder don't make much sense, and
      it's been annoying to sort through it with these trivial functions
      separate from the declaration.
    • Guillaume Chatelet's avatar
    • Guillaume Chatelet's avatar
      [Alignment][NFC] TargetLowering::allowsMisalignedMemoryAccesses · 3b6196c9
      Guillaume Chatelet authored
      Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMisalignedMemoryAccesses` without marking it override.
      This patch is part of a series to introduce an Alignment type.
      See this thread for context:
      See this patch for the introduction of the type:
      Reviewers: courbet
      Subscribers: hiraditya, llvm-commits
      Tags: #llvm
      Differential Revision:
    • Guillaume Chatelet's avatar
      [Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess · f21c5266
      Guillaume Chatelet authored
      Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMemoryAccess` without marking it override.
      This patch is part of a series to introduce an Alignment type.
      See this thread for context:
      See this patch for the introduction of the type:
      Reviewers: courbet
      Subscribers: hiraditya, llvm-commits
      Tags: #llvm
      Differential Revision:
    • Guillaume Chatelet's avatar
      [Alignment] Fix deprecation message · 49dd8e79
      Guillaume Chatelet authored
    • Kang Zhang's avatar
      [MachineVerifier] Add TiedOpsRewritten flag to fix verify two-address error · 1b660227
      Kang Zhang authored
      Currently, MachineVerifier will attempt to verify that tied operands
      satisfy register constraints as soon as the function is no longer in
      SSA form. However, PHIElimination will take the function out of SSA
      form while TwoAddressInstructionPass will actually rewrite tied operands
      to match the constraints. PHIElimination runs first in the pipeline.
      Therefore, whenever the MachineVerifier is run after PHIElimination,
      it will encounter verification errors on any tied operands.
      This patch adds a function property called TiedOpsRewritten that will be
      set by TwoAddressInstructionPass and will control when the verifier checks
      tied operands.
      Reviewed By: nemanjai
      Differential Revision:
    • David Sherwood's avatar
      [CodeGen] Ensure callers of CreateStackTemporary use sensible alignments · cc887240
      David Sherwood authored
      In two instances of CreateStackTemporary we are sometimes promoting
      alignments beyond the stack alignment. I have introduced a new function
      called getReducedAlign that will return the alignment for the broken
      down parts of illegal vector types. For example, on NEON a <32 x i8>
      type is made up of two <16 x i8> types - in this case the sensible
      alignment is 16 bytes, not 32.
      In the legalization code wherever we create stack temporaries I have
      started using the reduced alignments instead for illegal vector types.
      I added a test to
      that tries to insert an element into an illegal fixed vector type
      that involves creating a temporary stack object.
      Differential Revision: