forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 353
[LV] Pick VPlan cost improvements #11963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
fhahn
merged 33 commits into
swiftlang:stable/21.x
from
fhahn:pick-vplan-cost-improvements
Dec 10, 2025
Merged
[LV] Pick VPlan cost improvements #11963
fhahn
merged 33 commits into
swiftlang:stable/21.x
from
fhahn:pick-vplan-cost-improvements
Dec 10, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Compute the cost of non-intrinsic, single-scalar calls directly in VPReplicateRecipe::computeCost. This starts moving call cost computations to VPlan, handling the simplest case first. (Cherry-picked from 79be94c)
…lvm#153361) A number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe. Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction. This makes it easier to cover all relevant opcodes, without duplication. PR: llvm#153361 (cherry picked from commit 35be64a)
…(NFC) We iterate over InstsToScalarize when printing costs, and currently the iteration order is not deterministic. Currently no tests check the output with multiple instructions in InstsToScalarize, but those will come soon. (cherry picked from commit c300a99)
We iterate over the scalar costs of instruction when printing costs, and currently the iteration order is not deterministic. Currently no tests check the output with multiple instructions in the map, but those will come soon.
…lvm#151940) We need to reject plans that contain recipes with invalid costs. LICM can move recipes with invalid costs out of the loop region, which then get missed by the main cost computation. Extend the logic to check recipes for invalid cost currently only covering the middle block to include all skeleton blocks. Fixes llvm#144358 Fixes llvm#151664 PR: llvm#151940 (cherry picked from commit 95c32bf)
…). (llvm#154126) Remove the ArrayRef<const Value*> Args operand from getOperandsScalarizationOverhead and require that the callers de-duplicate arguments and filter constant operands. Removing the Value * based Args argument enables callers where no Value * operands are available to use the function in a follow-up: computing the scalarization cost directly for a VPlan recipe. It also allows more accurate cost-estimates in the future: for example, when vectorizing a loop, we could also skip operands that are live-ins, as those also do not require scalarization. PR: llvm#154126 (cherry picked from commit 4e6c88b)
Implement computing the scalarization overhead for replicating calls in VPlan, matching the legacy cost model. Depends on llvm#154126. PR: llvm#154291 (cherry picked from commit c3470d1)
Refactor to prepare for llvm#154617. (cherry picked from commit 5e32f72)
…CI). (llvm#154617) Handle intrinsic calls in VPReplicateRecipe::computeCost. There are some intrinsics pseudo intrinsics for which the computed cost is known zero, so we handle those up front. Depends on llvm#154291. PR: llvm#154617 (cherry-picked from df09879)
Check for scalarized calls in needsExtract to fix a divergence between legacy and VPlan-based cost model. The legacy cost model was missing a check for scalarized calls in needsExtract, which meant if incorrectly assumed the result of a scalarized call needs extracting. Exposed by llvm#154617. Fixes llvm#156091. (cherry picked from commit 0aac227)
The second operand when using a safe divisor will always be a select in the loop, so won't be invariant; don't treat it as such. This fixes a divergence with legacy and VPlan based cost model. Fixes llvm#156066. (cherry picked from commit e0f00bd)
Extract the logic to compute the scalarization overhead to a helper for easy re-use in the future. (cherry-picked from commit 30e9cba)
getCostForRecipeWithOpcode must only be called with supported opcodes. Directly return the cost, and add llvm_unreachable to catch unhandled cases. (cherry picked from commit fb60d03)
Account for predicated UDiv,SDiv,URem,SRem in VPReplicateRecipe::computeCost: compute costs of extra phis and apply getPredBlockCostDivisor. Fixes llvm#158660 (cherry picked from commit 1858532)
Loads of addresses are scalarized and have their costs computed w/o scalarization overhead. Consistently apply this logic also to non-uniform loads that are already scalarized, to ensure their costs are consistent with other scalarized lodas that are used as addresses. (cherry picked from commit 7dd9b3d)
For UDiv/SDiv with invariant divisors, the created selects will be hoisted out. Don't compute their cost for each iteration, to match the more accurate VPlan-based cost modeling. Fixes llvm#159402. (cherry picked from commit addfdb5)
This ensures each scalarized member has an accurate cost, matching the cost it would have if it would not have been considered for an interleave group. (cherry picked from commit 49605a4)
In some cases, safe-divisor selects can be hoisted out of the vector loop. Catching all cases in the legacy cost model isn't possible, in particular checking if all conditions guarding a division are loop invariant. Instead, check in planContainsAdditionalSimplifications if there are any hoisted safe-divisor selects. If so, don't compare to the more inaccurate legacy cost model. Fixes llvm#160354. Fixes llvm#160356. (cherry picked from commit 88aab08)
If there are direct memory op users of the newly scalarized load, their cost may have changed because there's no scalarization overhead for the operand. Update it. This ensures assigning consistent costs to scalarized memory instructions that themselves have scalarized memory instructions as operands. (cherry picked from commit 1a85027)
…I) (llvm#151487) Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to serve their users requiring a vector, instead of doing so when unrolling by VF. Now we only need to implicitly build vectors in VPTransformState::get for VPInstructions. Once they are also unrolled by VF we can remove the code-path alltogether. PR: llvm#151487 (cherry picked from commit 7e99893)
Move handling of stores to single-scalar/uniform address from replicateByVF to narrowToSingleScalar. (cherry picked from commit 1efa997)
Add test cases for canonicalizing AddRecs that may wrap. (cherry picked from commit e894654)
Add more tests for follow-up to llvm#169576. (cherry picked from commit 41519b3)
…lvm#169576) Extend the {X,+,N}/C => {(X - X%N),+,N}/C canonicalization to handle AddRecs that may wrap, when X < N <= C and both N,C are powers of 2. The alignment and power-of-2 properties ensure division results remain equivalent for all offsets [(X - X%N), X). Alive2 Proof: https://alive2.llvm.org/ce/z/iu2tav Fixes llvm#168709 PR: llvm#169576 (cherry picked from commit 5d87609)
Always add pointers proved to be uniform via legal/SCEV to worklist. This extends the existing logic to handle a few more pointers known to be uniform. (cherry picked from commit 0c028bb)
…omputeCost. (llvm#160053)" (llvm#162157) This reverts commit f80c0ba and 94eade6. Recommit a small fix for targets using prefersVectorizedAddressing. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm#160053 (cherry picked from commit 74af578)
…lvm#164487) Add a new getGEPExpr variant which is independent of GEPOperator*. To be used to construct SCEVs for VPlan recipes in llvm#161276. PR: llvm#164487 (cherry picked from commit a321ce3)
…lvm#161276) Update getSCEVExprForVPValue to handle more complex expressions, to use it in VPReplicateRecipe::comptueCost. In particular, it supports construction SCEV expressions for GetElementPtr VPReplicateRecipes, with operands that are VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we hit a sub-expression we don't support yet, we return SCEVCouldNotCompute. Note that the SCEV expression is valid VF = 1: we only support construction AddRecs for VPCanonicalIVRecipe, which is an AddRec starting at 0 and stepping by 1. The returned SCEV expressions could be converted to a VF specific one, by rewriting the AddRecs to ones with the appropriate step. Note that the logic for constructing SCEVs for GetElementPtr was directly ported from ScalarEvolution.cpp. Another thing to note is that we construct SCEV expression purely by looking at the operation of the recipe and its translated operands, w/o accessing the underlying IR (the exception being getting the source element type for GEPs). PR: llvm#161276
In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either: 1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or 2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero. For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which *aren't* accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions. (cherry picked from commit d606eae)
Extend [Specific]Cmp_match to handle floating-point compares, and introduce m_Cmp that matches both integer and floating-point compares. Use it in simplifyRecipe to match and simplify the general case of compares. The change has necessitated a bugfix in VPReplicateRecipe::execute. (cherry picked from commit 66be00d)
…m#170278) In some cases, the lowering a select depends on the predicate. If the condition of a select is a compare instruction, thread the predicate through to the TTI hook. PR: llvm#170278 (cherry picked from commit 50916a4)
Author
|
@swift-ci please test |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
rdar://163931465