This project is mirrored from https://github.com/llvm-doe-org/llvm-project.git.
Pull mirroring updated .
- Feb 02, 2025
-
-
Michał Górny authored
Use the test compiler ID to verify whether tests can be run rather than the host compiler. This makes it possible to run tests (with Clang) while the library itself was built with GCC.
-
Michał Górny authored
Use `gnu::format` attribute only when compiling with Clang, as using it against variadic template functions is a Clang extension and is not supported by GCC. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77958 Fixes #119069
-
Andrzej Warzyński authored
This is PR 1 in a series of N patches aimed at improving "VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no major functional changes are made/added. This PR renames: * `srcBits`/`dstBits` + `oldElementType`/`newElementType` to improve consistency in naming within the file. This is illustrated below: ```cpp // Extracted from VectorEmulateNarrowType.cpp // BEFORE (mixing old/new and src/dst): // Type oldElementType = op.getType().getElementType(); // Type newElementType = convertedType.getElementType(); // int srcBits = oldElementType.getIntOrFloatBitWidth(); // int dstBits = newElementType.getIntOrFloatBitWidth(); // AFTER (consistently using emulated/container): Type emulatedElemType = op.getType().getElementType(); Type containerElemType = convertedType.getElementType(); int emulatedBits = emulatedElemTy.getIntOrFloatBitWidth(); int containerBits = containerElemTy.getIntOrFloatBitWidth(); ``` Also adds some comments and unifies related "rewriter notification" messages. **GitHub issue to track this work:** * https://github.com/llvm/llvm-project/issues/123630
-
Andrzej Warzyński authored
For context, `tensor.insert_slice` is vectorized using a `vector.transfer_read` + `vector.transfer_write` pair. An unmasked example is shown below: ```mlir // BEFORE VECTORIZATION %res = tensor.insert_slice %slice into %dest[0, %c2] [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32> // AFTER VECTORIZATION %read = vector.transfer_read %source[%c0, %c0], %pad : tensor<5x1xi32>, vector<8x1xi32> %res = vector.transfer_write %read, %dest[%c0, %c2] : vector<8x1xi32>, tensor<5x3xi32> ``` This PR refactors `InsertSliceVectorizePattern` (which is used to vectorize `tensor.extract_slice`) to enable masked vectorization. ATM, only `vector.transfer_read` is masked. If `vector.transfer_write` also requires masking, the vectorizer will bail out. This will be addressed in a sub-sequent PR. Summary of changes: * Added an argument to specify vector sizes (behavior remains unchanged if vector sizes are not specified). * Renamed `InsertSliceVectorizePattern` to `vectorizeAsInsertSliceOp` and integrated into (alongside other hooks for vectorization) in `linalg::vectorize`. * Removed `populateInsertSliceVectorizationPatterns`, as `InsertSliceVectorizePattern` was its only pattern. * Updated `vectorizeAsInsertSliceOp` to support masking for the "read" operation. * Updated `@pad_and_insert_slice_dest` in "vectorization-pad-patterns.mlir" to reflect the removal of `populateInsertSliceVectorizationPatterns` from `ApplyPadVectorizationPatternsOps`.
-
Martin Storsjö authored
This reverts commit d5a7a483. That commit triggers failed asserts, see https://github.com/llvm/llvm-project/pull/123360 for details.
-
Florian Hahn authored
Nothing in VPlan.h directly depends on VPTransformState, VPCostContext, VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate header to reduce the size of widely used VPlan.h. This is a first step towards more cleanly separating declarations in VPlan. Besides reducing VPlan.h's size, this also allows including additional VPlan-related headers in VPlanHelpers.h for use there. An example is using VPDominatorTree in VPTransformState (https://github.com/llvm/llvm-project/pull/117138). PR: https://github.com/llvm/llvm-project/pull/124104
-
Timm Baeder authored
The Expr and its Type were unused otherwise.
-
Timm Baeder authored
Some function types are special to us, so add an enum and determinte the function kind once when creating the function, instead of looking at the Decl every time we need the information.
-
Yingwei Zheng authored
Compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=d09b521624f263b5f1296f8d4771836b97e600cb&to=e437ba2cb83bb965e13ef00727671896f03ff84f&stat=instructions:u IR diff looks acceptable. Closes https://github.com/llvm/llvm-project/issues/115574
-
Timm Baeder authored
These were missing here and are used in a few libc++ tests.
-
Yingwei Zheng authored
This patch extends https://github.com/llvm/llvm-project/commit/f6bb156fb10cd83953a34f75b78835cdf399ee8b to handle minmax intrinsics. Motivating case: https://alive2.llvm.org/ce/z/JFKbYn Addresses a regression caused by https://github.com/llvm/llvm-project/pull/121958. It also works for `*.sat`. But no real-world benefit is observed.
-
Craig Topper authored
-
David Green authored
-
Craig Topper authored
-
Owen Pan authored
-
Roland McGrath authored
While GCC's -Wdeprecated is on by default and doesn't do much, Clang's -Wdeprecated enables many more things. More apply in C++20, so switch a test file that tickled one to using that. In future, C++20 should probably be made the baseline for compiling all the libc code.
-
Justin Fargnoli authored
Note: [lower-args.ll](https://github.com/llvm/llvm-project/compare/main...justinfargnoli:dev/jf/ptxas?expand=1#diff-649d37d1f897d829fb809025437ba5df2e0c8da8395bbac7be713cd8f5bd8237) and [kernel-param-align.ll](https://github.com/llvm/llvm-project/compare/main...justinfargnoli:dev/jf/ptxas?expand=1#diff-31f196478b41b95b51298eb8e2efccc8a6f1156f13b648c07db27dd09579f74e) fail because`ptxas` doesn't support constant pointers in separate complication mode (`-c`).
-
- Feb 01, 2025
-
-
Fangrui Song authored
We may add another state State::Wild to behave more lik GNU ld.
-
FantasqueX authored
Remove duplicate word.
-
Saleem Abdulrasool authored
This corrects a swapped order of the spelling of blocks in the check. This enables the correct forward declarations which were previously disabled.
-
Florian Hahn authored
When VPWidenIntrinsicRecipe was changed to inhert from VPRecipeWithIRFlags, VPRecipeWithIRFlags::classof wasn't updated accordingly. Also check for VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof. Fixes https://github.com/llvm/llvm-project/issues/125301.
-
Florian Hahn authored
-
Alexey Bataev authored
Patch tries to remove wide alternate operations. Currently SLP vectorizer emits something like this: ``` %0 = add i32 %1 = sub i32 %2 = add i32 %3 = sub i32 %4 = add i32 %5 = sub i32 %6 = add i32 %7 = sub i32 transformes to %v1 = add <8 x i32> %v2 = sub <8 x i32> %res = shuffle %v1, %v2, <0, 9, 2, 11, 4, 13, 6, 15> ``` i.e. half of the results are just unused. This leads to increased register pressure and potentially doubles number of operations. Patch introduces SplitVectorize mode, where it splits the operations by opcodes and produces instead something like this: ``` %v1 = add <4 x i32> %v2 = sub <4 x i32> %res = shuffle %v1, %v2, <0, 4, 1, 5, 2, 6, 3, 7> ``` It allows to improve the performance by reducing number of ops. Also, it turns on some other improvements, like improved graph reordering. -O3+LTO, AVX512 Metric: size..text Program size..text results results0 diff test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 277800.00 280536.00 1.0% test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test 81802.00 82426.00 0.8% test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 790552.00 790952.00 0.1% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 383795.00 383987.00 0.1% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 2075541.00 2076501.00 0.0% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 2075541.00 2076501.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 312702.00 312766.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12569783.00 12569751.00 -0.0% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2049374.00 2049358.00 -0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 1091836.00 1091772.00 -0.0% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 852339.00 852211.00 -0.0% test-suite :: MultiSource/Applications/oggenc/oggenc.test 190651.00 190523.00 -0.1% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test 44203.00 44155.00 -0.1% test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test 12997.00 12981.00 -0.1% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 668971.00 658427.00 -1.6% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 668971.00 658427.00 -1.6% Prolangs-C/TimberWolfMC/timberwolfmc - small variations, some code not inlined FreeBench/pifft - extra stores <8 x double> vectorized, some other extra vectorizations CINT2006/464.h264ref - some smaller code + changes similar to x264 JM/ldecod - changes similar x264 CINT2017speed/600.perlbench_s CINT2017rate/500.perlbench_r - significantly compact vector code Benchmarks/Bullet - small variations CFP2017rate/526.blender_r - small variations CFP2017rate/510.parest_r - small variations CINT2006/400.perlbench - extra vector code JM/lencod - extra store <16 x i32> and other changes similar x264 Applications/oggenc - extra store <16 x i8>, small variations DOE-ProxyApps-C/miniGMG - small variations Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - better vector code CINT2017speed/625.x264_s CINT2017rate/525.x264_r - the number of instructions increased, but looks like they are more performant. E.g., for function x264_pixel_satd_8x8, llvm-mca reports better throughput - 84 for the current version and 59 for the new version. -O3+LTO, march=rva32u64 CINT2017rate/525.x264_r - similar to x86, extra code in pixel_hadamard_ac function vectorized, idct4x4dc stopped being vectorized (looks like issue with shuffles cost) CINT2006/400.perlbench - better vector code CINT2006/445.gobmk - some variations in vector code CINT2006/464.h264ref - extra code vectorized CINT2017rate/500.perlbench_r - small variations -O3+LTO, mcpu=sifive-p470 Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test 587336.00 587668.00 0.1% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 643308.00 643614.00 0.0% test-suite :: MultiSource/Applications/d/make_dparser.test 79678.00 79710.00 0.0% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 277322.00 277420.00 0.0% test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test 933660.00 933682.00 0.0% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9497722.00 9497682.00 -0.0% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 1767806.00 1767772.00 -0.0% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 1767806.00 1767772.00 -0.0% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 148038.00 148024.00 -0.0% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 283036.00 283008.00 -0.0% test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 4776.00 4772.00 -0.1% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 540582.00 511772.00 -5.3% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 540582.00 511772.00 -5.3% CINT2006/464.h264ref - extra vector code in find_sad_16x16 JM/lencod - extra vector code in find_sad_16x16 d/make_dparser - smaller vector code Benchmarks/Bullet - small variations CINT2006/400.perlbench - smaller vector code CFP2017rate/526.blender_r - small variations, extra store <8 x float> in the loop, extra store <8 x i8> in loop CINT2017rate/500.perlbench_r CINT2017speed/600.perlbench_s - small variations MiBench/consumer-lame - small variations JM/ldecod - extra vector code mediabench/g721/g721encode - small variations CINT2017rate/525.x264_r CINT2017speed/625.x264_s - reduced number of wide operations and shuffles, saving the registers, similar to X86, extra code in pixel_hadamard_ac vectorized, idct4x4dc not vectorized (issue with some TTI costs) Reviewers: RKSimon, hiraditya Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/123360
-
Craig Topper authored
Make SplatPat_simm5_plus1 responsible for decrementing the immediate instead of requiring DecImm SDNodeXForm to be used after. This allows better sharing of tablegen classes.
-
Sergei Barannikov authored
Providing the correct operand index allows addPhysRegDataDeps to compute the correct latency. Pull Request: https://github.com/llvm/llvm-project/pull/123541
-
Craig Topper authored
We have ComplexPatterns that reduce 3 patterns to 1, by handling the ==/!= 0, imm, and register cases. These are used for XTHeadCondMove, Zicond, XVentanaCondOps, and our basic seteq/setne patterns.
-
Timm Baeder authored
…5325) Move the BottomFrame to InterpState instead.
-
Simon Pilgrim authored
[CostModel][RISCV] vp-intrinsics.ll - add common check prefix for ARGBASED + TYPEBASED test coverage (#125245) Inspired by #125223 - helps identify when the cost models are relying on arg data (or failures in getTypeBasedIntrinsicInstrCost)
-
Simon Pilgrim authored
[TTI] getTypeBasedIntrinsicInstrCost - add basic handling for strided load/store intrinsics (#125223) (REAPPLIED) As noted on #124499 - this is currently missing for type-only analysis and was falling back to scalarization for fixed vectors (and failing entirely for scalable vectors)
-
Kazu Hirata authored
-
Kazu Hirata authored
Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect E to be nonnull.
-
Kazu Hirata authored
Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect InVectors.front() and P to be nonnull.
-
Balazs Benics authored
-
Sergio Sánchez Ramírez authored
cc @tobiasgrosser @wsmoses this PR adds some new ops and types to the MLIR MPI dialect. the goal is to get the minimum required ops here to get a project of us working, and if everything works well, continue adding ops to the mpi dialect on subsequent PRs until we achieve some level of compliance with the MPI standard. --- Things left to do in subsequent PRs: - Add back the `mpi.comm` type and add as optional argument of current implemented ops that should support it (i.e. `send`, `recv`, `isend`, `irecv`, `allreduce`, `barrier`). - Support defining custom `MPI_Op`s (the MPI operations, not the tablegen `MPI_Op`) as regions. - Add more ops.
-
Yingwei Zheng authored
[InstCombine] Check nowrap flags when folding comparison of GEPs with the same base pointer (#121892) Alive2: https://alive2.llvm.org/ce/z/P5XbMx Closes https://github.com/llvm/llvm-project/issues/121890 TODO: It is still safe to perform this transform without nowrap flags if the corresponding scale factor is 1 byte: https://alive2.llvm.org/ce/z/J-JCJd
-
Haojian Wu authored
DeclareImplicitDeductionGuidesForTypeAlias. This improves the code readability.
-
yronglin authored
Clang currently support extending lifetime of object bound to reference members of aggregates, that are created from default member initializer. This PR address this change and updaye CFG and ExprEngine. This PR reapply https://github.com/llvm/llvm-project/pull/91879. Fixes https://github.com/llvm/llvm-project/issues/93725 . --------- Signed-off-by:
yronglin <yronglin777@gmail.com>
-
Thurston Dang authored
Forked from llvm/test/CodeGen/AArch64/arm64-vmovn.ll Unknown intrinsics which are currently incorrectly handled by visitInstruction: - llvm.aarch64.neon.sqxtn - llvm.aarch64.neon.sqxtun - llvm.aarch64.neon.uqxtn
-
Ben Shi authored
-
Andreas Jonson authored
Proof: https://alive2.llvm.org/ce/z/Jncqb2
-