This project is mirrored from https://github.com/llvm-doe-org/llvm-project.git.
Pull mirroring updated .
- 17 Jun, 2020 1 commit
-
-
Sjoerd Meijer authored
Refactor TTI hook emitGetActiveLaneMask and remove the unused arguments as suggested in D79100.
-
- 15 Jun, 2020 1 commit
-
-
Sam Parker authored
Move the cost modelling, with the reduction pattern matching, from getInstructionThroughput into generic TTIImpl::getUserCost. The modelling in the AMDGPU backend can now be removed. Differential Revision: https://reviews.llvm.org/D81643
-
- 06 Jun, 2020 1 commit
-
-
dfukalov authored
Summary: In some cases inner loops may not get boosts so try to analyze them deeper. Reviewers: rampitec, mzolotukhin Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81204
-
- 29 May, 2020 1 commit
-
-
Sjoerd Meijer authored
This is split off from D79100 and adds a new target hook emitGetActiveLaneMask that can be queried to check if the intrinsic @llvm.get.active.lane.mask() is supported by the backend and if it should be emitted for a given loop. See also commit rG7fb8a40e and its commit message for more details/context on this new intrinsic. Differential Revision: https://reviews.llvm.org/D80597
-
- 28 May, 2020 1 commit
-
-
Matt Arsenault authored
This one is slightly odd since it counts as an address expression, which previously could never fail. Allow the existing TTI hook to return the value to use, and re-use it for handling how to handle ptrmask. Handles the no-op addrspacecasts for AMDGPU. We could probably do something better based on analysis of the mask value based on the address space, but leave that for now.
-
- 26 May, 2020 1 commit
-
-
Sam Parker authored
Recommitting most of the remaining changes from 259eb619, but excluding the call to getUserCost from getInstructionThroughput. Though there's still no test changes, I doubt that this is an NFC... With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. The remaining scalar intrinsics were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. This had required a change in the AMDGPU backend for fabs as it should always be 'free'. I've also changed the X86 backend to return the BaseT implementation when the CostKind isn't RecipThroughput. Differential Revision: https://reviews.llvm.org/D80012
-
- 21 May, 2020 3 commits
-
-
Sam Parker authored
This reverts commit de71def3. This is causing some very large changes, so I'm first going to break this patch down and re-commit in parts.
-
Sam Parker authored
With the two getIntrinsicInstrCosts folded into one, now fold in the scalar/code-size orientated getIntrinsicCost. This involved sinking cost of the TTIImpl into the base implementation, as it performs no target checks. The opcodes remaining were memcpy, cttz and ctlz which now have special handling in the BasicTTI implementation. getInstructionThroughput can now directly return the result of getUserCost. This had required a change in the AMDGPU backend for fabs and its always 'free'. I've also changed the X86 backend to return '1' for any intrinsic when the CostKind isn't RecipThroughput. Though this intended to be a non-functional change, there are many paths being combined here so I would be very surprised if this didn't have an effect. Differential Revision: https://reviews.llvm.org/D80012
-
Sam Parker authored
This has not been implemented by any backends which appear to cover the functionality through getCastInstrCost. Sink what there is in the default implementation into BasicTTI. Differential Revision: https://reviews.llvm.org/D78922
-
- 20 May, 2020 1 commit
-
-
Sam Parker authored
Combine the two API calls into one by introducing a structure to hold the relevant data. This has the added benefit of moving the boiler plate code for arguments and flags, into the constructors. This is intended to be a non-functional change, but the complicated web of logic involved here makes it very hard to guarantee. Differential Revision: https://reviews.llvm.org/D79941
-
- 19 May, 2020 1 commit
-
-
Eli Friedman authored
-
- 13 May, 2020 1 commit
-
-
Pierre-vh authored
This patch adds a new TTI hook to allow targets to tell LSR that a chain including some instruction is already profitable and should not be optimized. This patch also adds an implementation of this TTI hook for ARM so LSR doesn't optimize chains that include the VCTP intrinsic. Differential Revision: https://reviews.llvm.org/D79418
-
- 05 May, 2020 2 commits
-
-
Simon Pilgrim authored
getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions). Followup to D78357 Reviewed By: @samparker Differential Revision: https://reviews.llvm.org/D79341
-
Sam Parker authored
Make the kind of cost explicit throughout the cost model which, apart from making the cost clear, will allow the generic parts to calculate better costs. It will also allow some backends to approximate and correlate the different costs if they wish. Another benefit is that it will also help simplify the cost model around immediate and intrinsic costs, where we currently have multiple APIs. RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html Differential Revision: https://reviews.llvm.org/D79002
-
- 29 Apr, 2020 1 commit
-
-
Simon Pilgrim authored
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited. This patch does 2 things: 1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern. 2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs. This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing. A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well. Reviewed By: @craig.topper Differential Revision: https://reviews.llvm.org/D78216
-
- 28 Apr, 2020 1 commit
-
-
Sam Parker authored
There are several different types of cost that TTI tries to provide explicit information for: throughput, latency, code size along with a vague 'intersection of code-size cost and execution cost'. The vectorizer is a keen user of RecipThroughput and there's at least 'getInstructionThroughput' and 'getArithmeticInstrCost' designed to help with this cost. The latency cost has a single use and a single implementation. The intersection cost appears to cover most of the rest of the API. getUserCost is explicitly called from within TTI when the user has been explicit in wanting the code size (also only one use) as well as a few passes which are concerned with a mixture of size and/or a relative cost. In many cases these costs are closely related, such as when multiple instructions are required, but one evident diverging cost in this function is for div/rem. This patch adds an argument so that the cost required is explicit, so that we can make the important distinction when necessary. Differential Revision: https://reviews.llvm.org/D78635
-
- 21 Apr, 2020 1 commit
-
-
Sam Parker authored
This API call has been used recently with, a very valid, expectation that it would do something useful but it doesn't actually query any backend information. So, remove this method and merge its functionality into getUserCost. As well as that, also use getCastInstrCost to get a proper cost from the backend for the concerned instructions though we only currently return the answer if it's considered free. The default implementation now also checks int/ptr conversions too, as well as truncs and bitcasts. Differential Revision: https://reviews.llvm.org/D76124
-
- 20 Apr, 2020 1 commit
-
-
Sam Parker authored
The API for shuffles and reductions uses generic Type parameters, instead of VectorType, and so assertions and casts are used a lot. This patch makes those types explicit, which means that the clients can't be lazy, but results in less ambiguity, and that can only be a good thing. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562 Differential Revision: https://reviews.llvm.org/D78357
-
- 19 Apr, 2020 1 commit
-
-
Florian Hahn authored
Remove some unnecessary includes, replace some with forward declarations. This also exposed a few places that were missing some includes.
-
- 15 Apr, 2020 1 commit
-
-
Simon Moll authored
-
- 02 Apr, 2020 2 commits
-
-
Jonas Paulsson authored
-
Jonas Paulsson authored
This patch adds - New arguments to getMinPrefetchStride() to let the target decide on a per-loop basis if software prefetching should be done even with a stride within the limit of the hw prefetcher. - New TTI hook enableWritePrefetching() to let a target do write prefetching by default (defaults to false). - In LoopDataPrefetch: - A search through the whole loop to gather information before emitting any prefetches. This way the target can get information via new arguments to getMinPrefetchStride() and emit prefetches more selectively. Collected information includes: Does the loop have a call, how many memory accesses, how many of them are strided, how many prefetches will cover them. This is NFC to before as long as the target does not change its definition of getMinPrefetchStride(). - If a previous access to the same exact address was 'read', and the current one is 'write', make it a 'write' prefetch. - If two accesses that are covered by the same prefetch do not dominate each other, put the prefetch in a block that dominates both of them. - If a ConstantMaxTripCount is less than ItersAhead, then skip the loop. - A SystemZ implementation of getMinPrefetchStride(). Review: Ulrich Weigand, Michael Kruse Differential Revision: https://reviews.llvm.org/D70228
-
- 01 Apr, 2020 1 commit
-
-
Sam Parker authored
getCallCost is only used within the different layers of TTI, with no backend implementing it so fold the base implementation into getUserCost. I think this is an NFC. Differential Revision: https://reviews.llvm.org/D77050
-
- 19 Mar, 2020 1 commit
-
-
Simon Moll authored
Summary: This patch adds IR intrinsics for vector-predicated integer arithmetic. It is subpatch #1 of the [integer slice](https://reviews.llvm.org/D57504#1732277) of [LLVM-VP](https://reviews.llvm.org/D57504). LLVM-VP is a larger effort to bring native vector predication to LLVM. Reviewed By: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D69891
-
- 16 Mar, 2020 1 commit
-
-
Matt Arsenault authored
-
- 11 Mar, 2020 1 commit
-
-
Anna Welker authored
Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525
-
- 02 Mar, 2020 1 commit
-
-
Arkady Shlykov authored
Summary: Current peeling implementation bails out in case of loop nests. The patch introduces a field in TargetTransformInfo structure that certain targets can use to relax the constraints if it's profitable (disabled by default). Also additional option is added to enable peeling manually for experimenting and testing purposes. Reviewers: fhahn, lebedev.ri, xbolva00 Reviewed By: xbolva00 Subscribers: RKSimon, xbolva00, hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D70304
-
- 24 Jan, 2020 1 commit
-
-
Austin Kerbow authored
Summary: Enable the new diveregence analysis by default for AMDGPU. Resubmit with test updates since GPUDA was causing failures on Windows. Reviewers: rampitec, nhaehnle, arsenm, thakis Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73315
-
- 22 Jan, 2020 2 commits
-
-
Nico Weber authored
This reverts commit a90a6502. Broke tests on Windows: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/13808
-
Austin Kerbow authored
Summary: Enable the new diveregence analysis by default for AMDGPU. Reviewers: rampitec, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73049
-
- 16 Jan, 2020 1 commit
-
-
Arkady Shlykov authored
This reverts commit 3f3017e1 because there's a failure on peel-loop-nests.ll with LLVM_ENABLE_EXPENSIVE_CHECKS on. Differential Revision: https://reviews.llvm.org/D70304
-
- 15 Jan, 2020 1 commit
-
-
Arkady Shlykov authored
Summary: Current peeling implementation bails out in case of loop nests. The patch introduces a field in TargetTransformInfo structure that certain targets can use to relax the constraints if it's profitable (disabled by default). Also additional option is added to enable peeling manually for experimenting and testing purposes. Reviewers: fhahn, lebedev.ri, xbolva00 Reviewed By: xbolva00 Subscribers: xbolva00, hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D70304
-
- 18 Dec, 2019 1 commit
-
-
Anna Welker authored
Add an extra parameter so alignment can be taken under consideration in gather/scatter legalization. Differential Revision: https://reviews.llvm.org/D71610
-
- 12 Dec, 2019 2 commits
-
-
Reid Kleckner authored
This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320
-
Reid Kleckner authored
Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381
-
- 09 Dec, 2019 1 commit
-
-
David Green authored
This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
-
- 06 Nov, 2019 1 commit
-
-
Sjoerd Meijer authored
We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040
-
- 31 Oct, 2019 1 commit
-
-
Hiroshi Yamauchi authored
Summary: (Split of off D67120) TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile guided size optimization. Reviewers: davidxl Subscribers: eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69580
-
- 25 Oct, 2019 1 commit
-
-
Guillaume Chatelet authored
Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307
-
- 14 Oct, 2019 1 commit
-
-
Sam Parker authored
Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
-