Commits · doe · llvm-doe / llvm-project

This project is mirrored from https://github.com/llvm-doe-org/llvm-project.git. Pull mirroring updated 29 minutes ago.

Feb 02, 2025

[offload] [test] Use test compiler ID rather than host (#124408) · 689ef5fd

Michał Górny authored 1 week ago

Use the test compiler ID to verify whether tests can be run rather than
the host compiler. This makes it possible to run tests (with Clang)
while the library itself was built with GCC.

Unverified

689ef5fd

[offload] `gnu::format` with variadic template functions is Clang-only (#124406) · 359a9131

Michał Górny authored 1 week ago

Use `gnu::format` attribute only when compiling with Clang, as using it
against variadic template functions is a Clang extension and is not
supported by GCC.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77958

Fixes #119069

Unverified

359a9131

[mlir][Vector] Update VectorEmulateNarrowType.cpp (1/N) (#123526) · 3e5640b2

Andrzej Warzyński authored 1 week ago

This is PR 1 in a series of N patches aimed at improving
"VectorEmulateNarrowType.cpp". This is mainly minor refactoring, no
major functional changes are made/added.

This PR renames:
* `srcBits`/`dstBits` + `oldElementType`/`newElementType`

to improve consistency in naming within the file. This is illustrated
below:

```cpp
  // Extracted from VectorEmulateNarrowType.cpp

  // BEFORE (mixing old/new and src/dst):
  // Type oldElementType = op.getType().getElementType();
  // Type newElementType = convertedType.getElementType();

  // int srcBits = oldElementType.getIntOrFloatBitWidth();
  // int dstBits = newElementType.getIntOrFloatBitWidth();

  // AFTER (consistently using emulated/container):
  Type emulatedElemType = op.getType().getElementType();
  Type containerElemType = convertedType.getElementType();

  int emulatedBits = emulatedElemTy.getIntOrFloatBitWidth();
  int containerBits = containerElemTy.getIntOrFloatBitWidth();
```

Also adds some comments and unifies related "rewriter notification"
messages.

**GitHub issue to track this work:**
* https://github.com/llvm/llvm-project/issues/123630

Unverified

3e5640b2

[mlir][linalg] Add support for masked vectorization of `tensor.insert_slice` (1/N) (#122927) · d68a4b93

Andrzej Warzyński authored 1 week ago

For context, `tensor.insert_slice` is vectorized using a
`vector.transfer_read` + `vector.transfer_write` pair.
An unmasked example is shown below:

```mlir
// BEFORE VECTORIZATION
%res = tensor.insert_slice
  %slice into %dest[0, %c2]
  [5, 1] [1, 1] : tensor<5x1xi32> into tensor<5x3xi32>

// AFTER VECTORIZATION
%read = vector.transfer_read %source[%c0, %c0], %pad 
  : tensor<5x1xi32>, vector<8x1xi32>
%res = vector.transfer_write %read, %dest[%c0, %c2] 
  : vector<8x1xi32>, tensor<5x3xi32>
```

This PR refactors `InsertSliceVectorizePattern` (which is used to
vectorize `tensor.extract_slice`) to enable masked vectorization. ATM,
only `vector.transfer_read` is masked. If `vector.transfer_write` also
requires masking, the vectorizer will bail out. This will be addressed
in a sub-sequent PR.

Summary of changes:
  * Added an argument to specify vector sizes (behavior remains
    unchanged if vector sizes are not specified).
  * Renamed `InsertSliceVectorizePattern` to `vectorizeAsInsertSliceOp`
    and integrated into (alongside other hooks for vectorization) in
    `linalg::vectorize`.
  * Removed `populateInsertSliceVectorizationPatterns`, as
    `InsertSliceVectorizePattern` was its only pattern.
  * Updated `vectorizeAsInsertSliceOp` to support masking for the
    "read" operation.
  * Updated `@pad_and_insert_slice_dest` in
    "vectorization-pad-patterns.mlir" to reflect the removal of
    `populateInsertSliceVectorizationPatterns` from
    `ApplyPadVectorizationPatternsOps`.

Unverified

d68a4b93

Revert "[SLP]Reduce number of alternate instruction, where possible" · d00579be

Martin Storsjö authored 1 week ago

This reverts commit d5a7a483.

That commit triggers failed asserts, see
https://github.com/llvm/llvm-project/pull/123360 for details.

d00579be

[VPlan] Move auxiliary declarations out of VPlan.h (NFC). (#124104) · 50082773

Florian Hahn authored 1 week ago

Nothing in VPlan.h directly depends on VPTransformState, VPCostContext,
VPFRange, VPlanPrinter or VPSlotTracker. Move them out to a separate
header to reduce the size of widely used VPlan.h.

This is a first step towards more cleanly separating declarations in
VPlan.

Besides reducing VPlan.h's size, this also allows including additional
VPlan-related headers in VPlanHelpers.h for use there. An example is
using VPDominatorTree in VPTransformState
(https://github.com/llvm/llvm-project/pull/117138).

PR: https://github.com/llvm/llvm-project/pull/124104

Unverified

50082773

[clang][bytecode][NFC] Only get expr when checking for UB (#125397) · 642e84f0
Timm Baeder authored 1 week ago
```
The Expr and its Type were unused otherwise.
```
Unverified

642e84f0

[clang][bytecode][NFC] Add a FunctionKind enum (#125391) · cf893baf

Timm Baeder authored 1 week ago

Some function types are special to us, so add an enum and determinte the
function kind once when creating the function, instead of looking at the
Decl every time we need the information.

Unverified

cf893baf

[InstSimplify] Add additional checks when substituting pointers (#125385) · 1af627b5

Yingwei Zheng authored 1 week ago

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=d09b521624f263b5f1296f8d4771836b97e600cb&to=e437ba2cb83bb965e13ef00727671896f03ff84f&stat=instructions:u
IR diff looks acceptable.
Closes https://github.com/llvm/llvm-project/issues/115574

Unverified

1af627b5

[clang][bytecode] Ignore Namespace{Using,Alias}Decls (#125387) · 00bdce1c
Timm Baeder authored 1 week ago
```
These were missing here and are used in a few libc++ tests.
```
Unverified

00bdce1c

[InstCombine] Extend `foldSelectInstWithICmpConst` to handle minmax (#125346) · caeefe7b

Yingwei Zheng authored 1 week ago

This patch extends
https://github.com/llvm/llvm-project/commit/f6bb156fb10cd83953a34f75b78835cdf399ee8b
to handle minmax intrinsics.
Motivating case: https://alive2.llvm.org/ce/z/JFKbYn 
Addresses a regression caused by
https://github.com/llvm/llvm-project/pull/121958.

It also works for `*.sat`. But no real-world benefit is observed.

Unverified

caeefe7b

[TableGen] Use range-based for loop. NFC · 2eabcb73
Craig Topper authored 1 week ago

2eabcb73
[GlobalISel] Add brackets around || in assert. NFC · 58033355
David Green authored 1 week ago

58033355
[MIParser] Don't use Register to hold Dwarf register numbers. NFC (#125263) · b2ef23cd
Craig Topper authored 1 week ago

Unverified

b2ef23cd
[clang-format] Fix a bug in annotating ClassHeadName (#125326) · 6980d979
Owen Pan authored 1 week ago

Unverified

6980d979

[libc] Build with -Wdeprecated, fix some warnings (#125373) · 648981f9

Roland McGrath authored 1 week ago

While GCC's -Wdeprecated is on by default and doesn't do much,
Clang's -Wdeprecated enables many more things.  More apply in
C++20, so switch a test file that tickled one to using that.  In
future, C++20 should probably be made the baseline for compiling
all the libc code.

Unverified

648981f9

[NVPTX] Fix `ptxas` failures (NFC) (#125147) · 312055d1

Justin Fargnoli authored 1 week ago

Note:
[lower-args.ll](https://github.com/llvm/llvm-project/compare/main...justinfargnoli:dev/jf/ptxas?expand=1#diff-649d37d1f897d829fb809025437ba5df2e0c8da8395bbac7be713cd8f5bd8237)
and
[kernel-param-align.ll](https://github.com/llvm/llvm-project/compare/main...justinfargnoli:dev/jf/ptxas?expand=1#diff-31f196478b41b95b51298eb8e2efccc8a6f1156f13b648c07db27dd09579f74e)
fail because`ptxas` doesn't support constant pointers in separate
complication mode (`-c`).

Unverified

312055d1

Feb 01, 2025

[ELF] Replace inExpr with lexState. NFC · 5c3c0a8c
Fangrui Song authored 1 week ago
```
We may add another state State::Wild to behave more lik GNU ld.
```
5c3c0a8c
[Kaleidoscope] Fix typo (#125366) · 14776c6d
FantasqueX authored 1 week ago
```
Remove duplicate word.
```
Unverified

14776c6d

test: correct a typo in the check identifier (NFCI) · b798679c

Saleem Abdulrasool authored 1 week ago

This corrects a swapped order of the spelling of blocks in the check.
This enables the correct forward declarations which were previously
disabled.

b798679c

[VPlan] Check VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof. · 75b922dc

Florian Hahn authored 1 week ago

When VPWidenIntrinsicRecipe was changed to inhert from VPRecipeWithIRFlags,
VPRecipeWithIRFlags::classof wasn't updated accordingly. Also check for
VPWidenIntrinsicSC in VPRecipeWithIRFlags::classof.

Fixes https://github.com/llvm/llvm-project/issues/125301.

75b922dc

[VPlan] Use Operands to create VPReplicateRecipe for invar store. (NFC) · 4f381d0b
Florian Hahn authored 1 week ago

Unverified

4f381d0b

[SLP]Reduce number of alternate instruction, where possible · d5a7a483

Alexey Bataev authored 1 week ago

Patch tries to remove wide alternate operations.
Currently SLP vectorizer emits something like this:
```
%0 = add i32
%1 = sub i32
%2 = add i32
%3 = sub i32
%4 = add i32
%5 = sub i32
%6 = add i32
%7 = sub i32

transformes to

%v1 = add <8 x i32>
%v2 = sub <8 x i32>
%res = shuffle %v1, %v2, <0, 9, 2, 11, 4, 13, 6, 15>
```
i.e. half of the results are just unused. This leads to increased
register pressure and potentially doubles number of operations.

Patch introduces SplitVectorize mode, where it splits the operations by
opcodes and produces instead something like this:
```
%v1 = add <4 x i32>
%v2 = sub <4 x i32>
%res = shuffle %v1, %v2, <0, 4, 1, 5, 2, 6, 3, 7>
```
It allows to improve the performance by reducing number of ops. Also, it
turns on some other improvements, like improved graph reordering.

-O3+LTO, AVX512
Metric: size..text
Program                                                                         size..text
                                                                                            results     results0    diff
           test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   277800.00   280536.00  1.0%
                          test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    81802.00    82426.00  0.8%
                        test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   790552.00   790952.00  0.1%
                             test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   383795.00   383987.00  0.1%
           test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  2075541.00  2076501.00  0.0%
            test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  2075541.00  2076501.00  0.0%
                                  test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   312702.00   312766.00  0.0%
                 test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12569783.00 12569751.00 -0.0%
                   test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2049374.00  2049358.00 -0.0%
                    test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1091836.00  1091772.00 -0.0%
                             test-suite :: MultiSource/Applications/JM/lencod/lencod.test   852339.00   852211.00 -0.0%
                                test-suite :: MultiSource/Applications/oggenc/oggenc.test   190651.00   190523.00 -0.1%
                test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test    44203.00    44155.00 -0.1%
test-suite :: SingleSource/UnitTests/Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw.test    12997.00    12981.00 -0.1%
                     test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   668971.00   658427.00 -1.6%
                      test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   668971.00   658427.00 -1.6%

Prolangs-C/TimberWolfMC/timberwolfmc - small variations, some code not
inlined
FreeBench/pifft - extra stores <8 x double> vectorized, some other extra
vectorizations
CINT2006/464.h264ref - some smaller code + changes similar to x264
JM/ldecod - changes similar x264
CINT2017speed/600.perlbench_s
CINT2017rate/500.perlbench_r - significantly compact vector code
Benchmarks/Bullet - small variations
CFP2017rate/526.blender_r - small variations
CFP2017rate/510.parest_r - small variations
CINT2006/400.perlbench - extra vector code
JM/lencod - extra store <16 x i32> and other changes similar x264
Applications/oggenc - extra store <16 x i8>, small variations
DOE-ProxyApps-C/miniGMG - small variations
Vector/AVX512BWVL/Vector-AVX512BWVL-mask_set_bw - better vector code
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - the number of instructions increased, but
looks like they are more performant. E.g., for function
x264_pixel_satd_8x8, llvm-mca reports better throughput - 84 for the
current version and 59 for the new version.

-O3+LTO, march=rva32u64

CINT2017rate/525.x264_r - similar to x86, extra code in pixel_hadamard_ac
function vectorized, idct4x4dc stopped being vectorized (looks like
issue with shuffles cost)
CINT2006/400.perlbench - better vector code
CINT2006/445.gobmk - some variations in vector code
CINT2006/464.h264ref - extra code vectorized
CINT2017rate/500.perlbench_r - small variations

-O3+LTO, mcpu=sifive-p470

Metric: size..text

Program                                                                                                                                                size..text
                                                                               results    results0   diff
             test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  587336.00  587668.00  0.1%
                  test-suite :: MultiSource/Applications/JM/lencod/lencod.test  643308.00  643614.00  0.0%
                    test-suite :: MultiSource/Applications/d/make_dparser.test   79678.00   79710.00  0.0%
                       test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  277322.00  277420.00  0.0%
         test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  933660.00  933682.00  0.0%
      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9497722.00 9497682.00 -0.0%
 test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 1767806.00 1767772.00 -0.0%
test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 1767806.00 1767772.00 -0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  148038.00  148024.00 -0.0%
                  test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  283036.00  283008.00 -0.0%
   test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4776.00    4772.00 -0.1%
           test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  540582.00  511772.00 -5.3%
          test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  540582.00  511772.00 -5.3%

CINT2006/464.h264ref - extra vector code in find_sad_16x16
JM/lencod - extra vector code in find_sad_16x16
d/make_dparser - smaller vector code
Benchmarks/Bullet - small variations
CINT2006/400.perlbench - smaller vector code
CFP2017rate/526.blender_r - small variations, extra store <8 x float> in
the loop, extra store <8 x i8> in loop
CINT2017rate/500.perlbench_r
CINT2017speed/600.perlbench_s - small variations
MiBench/consumer-lame - small variations
JM/ldecod - extra vector code
mediabench/g721/g721encode - small variations
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - reduced number of wide operations and
shuffles, saving the registers, similar to X86, extra code in
pixel_hadamard_ac vectorized, idct4x4dc not vectorized (issue with some
TTI costs)

Reviewers: RKSimon, hiraditya

Reviewed By: RKSimon

Pull Request: https://github.com/llvm/llvm-project/pull/123360

d5a7a483

[RISCV] Simplify usage of SplatPat_simm5_plus1. NFC (#125340) · 5cba1f12

Craig Topper authored 1 week ago

Make SplatPat_simm5_plus1 responsible for decrementing the immediate
instead of requiring DecImm SDNodeXForm to be used after. This allows
better sharing of tablegen classes.

Unverified

5cba1f12

[MachineScheduler] Fix physreg dependencies of ExitSU (#123541) · ff9c041d

Sergei Barannikov authored 1 week ago

Providing the correct operand index allows addPhysRegDataDeps to compute
the correct latency.

Pull Request: https://github.com/llvm/llvm-project/pull/123541

Unverified

ff9c041d

[RISCV] Simplify MIPS CCMov patterns. NFC (#125318) · 15336823

Craig Topper authored 1 week ago

We have ComplexPatterns that reduce 3 patterns to 1, by handling the
==/!= 0, imm, and register cases. These are used for XTHeadCondMove,
Zicond, XVentanaCondOps, and our basic seteq/setne patterns.

Unverified

15336823

Reapply "[clang][bytecode] Stack-allocate bottom function frame" (#12… (#125349) · 06130ed3
Timm Baeder authored 1 week ago
```
…5325)

Move the BottomFrame to InterpState instead.
```
Unverified

06130ed3

[CostModel][RISCV] vp-intrinsics.ll - add common check prefix for ARGBASED +... · 2791843b

Simon Pilgrim authored 1 week ago

[CostModel][RISCV] vp-intrinsics.ll - add common check prefix for ARGBASED + TYPEBASED test coverage (#125245)

Inspired by #125223 - helps identify when the cost models are relying on arg data (or failures in getTypeBasedIntrinsicInstrCost)

Unverified

2791843b

[TTI] getTypeBasedIntrinsicInstrCost - add basic handling for strided... · 71d05ac6

Simon Pilgrim authored 1 week ago

[TTI] getTypeBasedIntrinsicInstrCost - add basic handling for strided load/store intrinsics (#125223) (REAPPLIED)

As noted on #124499 - this is currently missing for type-only analysis and was falling back to scalarization for fixed vectors (and failing entirely for scalable vectors)

71d05ac6

[SandboxIR] Avoid repeated hash lookups (NFC) (#125337) · 8266eedf
Kazu Hirata authored 1 week ago

Unverified

8266eedf

[CodeGen] Migrate away from PointerUnion::dyn_cast (NFC) (#125336) · e11e65f0

Kazu Hirata authored 1 week ago

Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect E to be nonnull.

Unverified

e11e65f0

[AST] Migrate away from PointerUnion::dyn_cast (NFC) (#125335) · 657dc6d0

Kazu Hirata authored 1 week ago

Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect InVectors.front() and P to be nonnull.

Unverified

657dc6d0

[CodeGen][NFC] Remove redundant map lookup (#125342) · 16d4453f
Balazs Benics authored 1 week ago

Unverified

16d4453f

[MLIR] Extend MPI dialect (#123255) · 48f88651

Sergio Sánchez Ramírez authored 1 week ago

cc @tobiasgrosser @wsmoses

this PR adds some new ops and types to the MLIR MPI dialect. the goal is
to get the minimum required ops here to get a project of us working, and
if everything works well, continue adding ops to the mpi dialect on
subsequent PRs until we achieve some level of compliance with the MPI
standard.

---

Things left to do in subsequent PRs:

- Add back the `mpi.comm` type and add as optional argument of current
implemented ops that should support it (i.e. `send`, `recv`, `isend`,
`irecv`, `allreduce`, `barrier`).
- Support defining custom `MPI_Op`s (the MPI operations, not the
tablegen `MPI_Op`) as regions.
- Add more ops.

Unverified

48f88651

[InstCombine] Check nowrap flags when folding comparison of GEPs with the same... · 9725595f

Yingwei Zheng authored 1 week ago

[InstCombine] Check nowrap flags when folding comparison of GEPs with the same base pointer (#121892)

Alive2: https://alive2.llvm.org/ce/z/P5XbMx
Closes https://github.com/llvm/llvm-project/issues/121890

TODO: It is still safe to perform this transform without nowrap flags if
the corresponding scale factor is 1 byte:
https://alive2.llvm.org/ce/z/J-JCJd

Unverified

9725595f

[clang] NFC, add a "continue" bailout in the for-loop of · 7612dcc6
Haojian Wu authored 1 week ago
```
DeclareImplicitDeductionGuidesForTypeAlias.

This improves the code readability.
```
7612dcc6

[Analyzer][CFG] Correctly handle rebuilt default arg and default init expression (#117437) · 44aa618e

yronglin authored 1 week ago

Clang currently support extending lifetime of object bound to reference
members of aggregates, that are created from default member initializer.
This PR address this change and updaye CFG and ExprEngine.

This PR reapply https://github.com/llvm/llvm-project/pull/91879.
Fixes https://github.com/llvm/llvm-project/issues/93725

.

---------

Signed-off-by: yronglin <yronglin777@gmail.com>

Unverified

44aa618e

[msan][NFCI] Add tests for Arm NEON saturating extract narrow (#125331) · 69905810

Thurston Dang authored 1 week ago

Forked from llvm/test/CodeGen/AArch64/arm64-vmovn.ll

Unknown intrinsics which are currently incorrectly handled by
visitInstruction:
- llvm.aarch64.neon.sqxtn
- llvm.aarch64.neon.sqxtun
- llvm.aarch64.neon.uqxtn

Unverified

69905810

[clang][StaticAnalyzer][NFC] Fix a typo in comments (#125339) · bfa7edcc
Ben Shi authored 1 week ago

Unverified

bfa7edcc
[InstSimplify] Handle trunc to i1 in Select with bit test folds. (#122944) · 9399a1dd
Andreas Jonson authored 1 week ago
```
Proof: https://alive2.llvm.org/ce/z/Jncqb2
```
Unverified

9399a1dd