Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/llvm-doe-org/llvm-project.git. Pull mirroring updated .
  1. Feb 02, 2025
  2. Feb 01, 2025
    • Fangrui Song's avatar
      [ELF] Replace inExpr with lexState. NFC · 5c3c0a8c
      Fangrui Song authored
      We may add another state State::Wild to behave more lik GNU ld.
      5c3c0a8c
    • FantasqueX's avatar
      [Kaleidoscope] Fix typo (#125366) · 14776c6d
      FantasqueX authored
      Remove duplicate word.
    • Saleem Abdulrasool's avatar
      test: correct a typo in the check identifier (NFCI) · b798679c
      Saleem Abdulrasool authored
      This corrects a swapped order of the spelling of blocks in the check.
      This enables the correct forward declarations which were previously
      disabled.
      b798679c
    • Florian Hahn's avatar
      [VPlan] Check VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof. · 75b922dc
      Florian Hahn authored
      When VPWidenIntrinsicRecipe was changed to inhert from VPRecipeWithIRFlags,
      VPRecipeWithIRFlags::classof wasn't updated accordingly. Also check for
      VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof.
      
      Fixes https://github.com/llvm/llvm-project/issues/125301.
      75b922dc
    • Florian Hahn's avatar
    • Alexey Bataev's avatar
      [SLP]Reduce number of alternate instruction, where possible · d5a7a483
      Alexey Bataev authored
      Patch tries to remove wide alternate operations.
      Currently SLP vectorizer emits something like this:
      ```
      %0 = add i32
      %1 = sub i32
      %2 = add i32
      %3 = sub i32
      %4 = add i32
      %5 = sub i32
      %6 = add i32
      %7 = sub i32
      
      transformes to
      
      %v1 = add <8 x i32>
      %v2 = sub <8 x i32>
      %res = shuffle %v1, %v2, <0, 9, 2, 11, 4, 13, 6, 15>
      ```
      i.e. half of the results are just unused. This leads to increased
      register pressure and potentially doubles number of operations.
      
      Patch introduces SplitVectorize mode, where it splits the operations by
      opcodes and produces instead something like this:
      ```
      %v1 = add <4 x i32>
      %v2 = sub <4 x i32>
      %res = shuffle %v1, %v2, <0, 4, 1, 5, 2, 6, 3, 7>
      ```
      It allows to improve the performance by reducing number of ops. Also, it
      turns on some other improvements, like improved graph reordering.
      
      -O3+LTO, AVX512
      Metric: size..text
      Program                                                                         size..text
                                                                                                  results     results0    diff
                 test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   277800.00   280536.00  1.0%
                                test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    81802.00    82426.00  0.8%
                              test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   790552.00   790952.00  0.1%
                                   test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   383795.00   383987.00  0.1%
                 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  2075541.00  2076501.00  0.0%
                  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  2075541.00  2076501.00  0.0%
                                        test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   312702.00   312766.00  0.0%
                       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12569783.00 12569751.00 -0.0%
                         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2049374.00  2049358.00 -0.0%
                          test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1091836.00  1091772.00 -0.0%
                                   test-suite :: MultiSource/Applications/JM/lencod/lencod.test   852339.00   852211.00 -0.0%
                                      test-suite :: MultiSource/Applications/oggenc/oggenc.test   190651.00   190523.00 -0.1%
                      test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test    44203.00    44155.00 -0.1%
      test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test    12997.00    12981.00 -0.1%
                           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   668971.00   658427.00 -1.6%
                            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   668971.00   658427.00 -1.6%
      
      Prolangs-C/TimberWolfMC/timberwolfmc - small variations, some code not
      inlined
      FreeBench/pifft - extra stores <8 x double> vectorized, some other extra
      vectorizations
      CINT2006/464.h264ref - some smaller code + changes similar to x264
      JM/ldecod - changes similar x264
      CINT2017speed/600.perlbench_s
      CINT2017rate/500.perlbench_r - significantly compact vector code
      Benchmarks/Bullet - small variations
      CFP2017rate/526.blender_r - small variations
      CFP2017rate/510.parest_r - small variations
      CINT2006/400.perlbench - extra vector code
      JM/lencod - extra store <16 x i32> and other changes similar x264
      Applications/oggenc - extra store <16 x i8>, small variations
      DOE-ProxyApps-C/miniGMG - small variations
      Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - better vector code
      CINT2017speed/625.x264_s
      CINT2017rate/525.x264_r - the number of instructions increased, but
      looks like they are more performant. E.g., for function
      x264_pixel_satd_8x8, llvm-mca reports better throughput - 84 for the
      current version and 59 for the new version.
      
      -O3+LTO, march=rva32u64
      
      CINT2017rate/525.x264_r - similar to x86, extra code in pixel_hadamard_ac
      function vectorized, idct4x4dc stopped being vectorized (looks like
      issue with shuffles cost)
      CINT2006/400.perlbench - better vector code
      CINT2006/445.gobmk - some variations in vector code
      CINT2006/464.h264ref - extra code vectorized
      CINT2017rate/500.perlbench_r - small variations
      
      -O3+LTO, mcpu=sifive-p470
      
      Metric: size..text
      
      Program                                                                                                                                                size..text
                                                                                     results    results0   diff
                   test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  587336.00  587668.00  0.1%
                        test-suite :: MultiSource/Applications/JM/lencod/lencod.test  643308.00  643614.00  0.0%
                          test-suite :: MultiSource/Applications/d/make_dparser.test   79678.00   79710.00  0.0%
                             test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  277322.00  277420.00  0.0%
               test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  933660.00  933682.00  0.0%
            test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9497722.00 9497682.00 -0.0%
       test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 1767806.00 1767772.00 -0.0%
      test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 1767806.00 1767772.00 -0.0%
       test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  148038.00  148024.00 -0.0%
                        test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  283036.00  283008.00 -0.0%
         test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4776.00    4772.00 -0.1%
                 test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  540582.00  511772.00 -5.3%
                test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  540582.00  511772.00 -5.3%
      
      CINT2006/464.h264ref - extra vector code in find_sad_16x16
      JM/lencod - extra vector code in find_sad_16x16
      d/make_dparser - smaller vector code
      Benchmarks/Bullet - small variations
      CINT2006/400.perlbench - smaller vector code
      CFP2017rate/526.blender_r - small variations, extra store <8 x float> in
      the loop, extra store <8 x i8> in loop
      CINT2017rate/500.perlbench_r
      CINT2017speed/600.perlbench_s - small variations
      MiBench/consumer-lame - small variations
      JM/ldecod - extra vector code
      mediabench/g721/g721encode - small variations
      CINT2017rate/525.x264_r
      CINT2017speed/625.x264_s - reduced number of wide operations and
      shuffles, saving the registers, similar to X86, extra code in
      pixel_hadamard_ac vectorized, idct4x4dc not vectorized (issue with some
      TTI costs)
      
      Reviewers: RKSimon, hiraditya
      
      Reviewed By: RKSimon
      
      Pull Request: https://github.com/llvm/llvm-project/pull/123360
      d5a7a483
    • Craig Topper's avatar
      [RISCV] Simplify usage of SplatPat_simm5_plus1. NFC (#125340) · 5cba1f12
      Craig Topper authored
      Make SplatPat_simm5_plus1 responsible for decrementing the immediate
      instead of requiring DecImm SDNodeXForm to be used after. This allows
      better sharing of tablegen classes.
    • Sergei Barannikov's avatar
      [MachineScheduler] Fix physreg dependencies of ExitSU (#123541) · ff9c041d
      Sergei Barannikov authored
      Providing the correct operand index allows addPhysRegDataDeps to compute
      the correct latency.
      
      Pull Request: https://github.com/llvm/llvm-project/pull/123541
    • Craig Topper's avatar
      [RISCV] Simplify MIPS CCMov patterns. NFC (#125318) · 15336823
      Craig Topper authored
      We have ComplexPatterns that reduce 3 patterns to 1, by handling the
      ==/!= 0, imm, and register cases. These are used for XTHeadCondMove,
      Zicond, XVentanaCondOps, and our basic seteq/setne patterns.
    • Timm Baeder's avatar
      Reapply "[clang][bytecode] Stack-allocate bottom function frame" (#12… (#125349) · 06130ed3
      Timm Baeder authored
      …5325)
      
      Move the BottomFrame to InterpState instead.
    • Simon Pilgrim's avatar
      [CostModel][RISCV] vp-intrinsics.ll - add common check prefix for ARGBASED +... · 2791843b
      Simon Pilgrim authored
      [CostModel][RISCV] vp-intrinsics.ll - add common check prefix for ARGBASED + TYPEBASED test coverage (#125245)
      
      Inspired by #125223 - helps identify when the cost models are relying on arg data (or failures in getTypeBasedIntrinsicInstrCost)
    • Simon Pilgrim's avatar
      [TTI] getTypeBasedIntrinsicInstrCost - add basic handling for strided... · 71d05ac6
      Simon Pilgrim authored
      [TTI] getTypeBasedIntrinsicInstrCost - add basic handling for strided load/store intrinsics (#125223) (REAPPLIED)
      
      As noted on #124499 - this is currently missing for type-only analysis and was falling back to scalarization for fixed vectors (and failing entirely for scalable vectors)
      71d05ac6
    • Kazu Hirata's avatar
    • Kazu Hirata's avatar
      [CodeGen] Migrate away from PointerUnion::dyn_cast (NFC) (#125336) · e11e65f0
      Kazu Hirata authored
      Note that PointerUnion::dyn_cast has been soft deprecated in
      PointerUnion.h:
      
        // FIXME: Replace the uses of is(), get() and dyn_cast() with
        //        isa<T>, cast<T> and the llvm::dyn_cast<T>
      
      Literal migration would result in dyn_cast_if_present (see the
      definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
      because we expect E to be nonnull.
    • Kazu Hirata's avatar
      [AST] Migrate away from PointerUnion::dyn_cast (NFC) (#125335) · 657dc6d0
      Kazu Hirata authored
      Note that PointerUnion::dyn_cast has been soft deprecated in
      PointerUnion.h:
      
        // FIXME: Replace the uses of is(), get() and dyn_cast() with
        //        isa<T>, cast<T> and the llvm::dyn_cast<T>
      
      Literal migration would result in dyn_cast_if_present (see the
      definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
      because we expect InVectors.front() and P to be nonnull.
    • Balazs Benics's avatar
    • Sergio Sánchez Ramírez's avatar
      [MLIR] Extend MPI dialect (#123255) · 48f88651
      Sergio Sánchez Ramírez authored
      cc @tobiasgrosser @wsmoses
      
      this PR adds some new ops and types to the MLIR MPI dialect. the goal is
      to get the minimum required ops here to get a project of us working, and
      if everything works well, continue adding ops to the mpi dialect on
      subsequent PRs until we achieve some level of compliance with the MPI
      standard.
      
      ---
      
      Things left to do in subsequent PRs:
      
      - Add back the `mpi.comm` type and add as optional argument of current
      implemented ops that should support it (i.e. `send`, `recv`, `isend`,
      `irecv`, `allreduce`, `barrier`).
      - Support defining custom `MPI_Op`s (the MPI operations, not the
      tablegen `MPI_Op`) as regions.
      - Add more ops.
    • Yingwei Zheng's avatar
      [InstCombine] Check nowrap flags when folding comparison of GEPs with the same... · 9725595f
      Yingwei Zheng authored
      [InstCombine] Check nowrap flags when folding comparison of GEPs with the same base pointer (#121892)
      
      Alive2: https://alive2.llvm.org/ce/z/P5XbMx
      Closes https://github.com/llvm/llvm-project/issues/121890
      
      TODO: It is still safe to perform this transform without nowrap flags if
      the corresponding scale factor is 1 byte:
      https://alive2.llvm.org/ce/z/J-JCJd
    • Haojian Wu's avatar
      [clang] NFC, add a "continue" bailout in the for-loop of · 7612dcc6
      Haojian Wu authored
      DeclareImplicitDeductionGuidesForTypeAlias.
      
      This improves the code readability.
      7612dcc6
    • yronglin's avatar
      [Analyzer][CFG] Correctly handle rebuilt default arg and default init expression (#117437) · 44aa618e
      yronglin authored
      Clang currently support extending lifetime of object bound to reference
      members of aggregates, that are created from default member initializer.
      This PR address this change and updaye CFG and ExprEngine.
      
      This PR reapply https://github.com/llvm/llvm-project/pull/91879.
      Fixes https://github.com/llvm/llvm-project/issues/93725
      
      .
      
      ---------
      
      Signed-off-by: default avataryronglin <yronglin777@gmail.com>
    • Thurston Dang's avatar
      [msan][NFCI] Add tests for Arm NEON saturating extract narrow (#125331) · 69905810
      Thurston Dang authored
      Forked from llvm/test/CodeGen/AArch64/arm64-vmovn.ll
      
      Unknown intrinsics which are currently incorrectly handled by
      visitInstruction:
      - llvm.aarch64.neon.sqxtn
      - llvm.aarch64.neon.sqxtun
      - llvm.aarch64.neon.uqxtn
    • Ben Shi's avatar
    • Andreas Jonson's avatar
Loading