[LV] Pick VPlan cost improvements #11963

fhahn · 2025-12-09T13:57:38Z

rdar://163931465

Compute the cost of non-intrinsic, single-scalar calls directly in VPReplicateRecipe::computeCost. This starts moving call cost computations to VPlan, handling the simplest case first. (Cherry-picked from 79be94c)

…lvm#153361) A number of recipes compute costs for the same opcodes for scalars or vectors, depending on the recipe. Move the common logic out to a helper in VPRecipeWithIRFlags, that is then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction. This makes it easier to cover all relevant opcodes, without duplication. PR: llvm#153361 (cherry picked from commit 35be64a)

…(NFC) We iterate over InstsToScalarize when printing costs, and currently the iteration order is not deterministic. Currently no tests check the output with multiple instructions in InstsToScalarize, but those will come soon. (cherry picked from commit c300a99)

We iterate over the scalar costs of instruction when printing costs, and currently the iteration order is not deterministic. Currently no tests check the output with multiple instructions in the map, but those will come soon.

…lvm#151940) We need to reject plans that contain recipes with invalid costs. LICM can move recipes with invalid costs out of the loop region, which then get missed by the main cost computation. Extend the logic to check recipes for invalid cost currently only covering the middle block to include all skeleton blocks. Fixes llvm#144358 Fixes llvm#151664 PR: llvm#151940 (cherry picked from commit 95c32bf)

…). (llvm#154126) Remove the ArrayRef<const Value*> Args operand from getOperandsScalarizationOverhead and require that the callers de-duplicate arguments and filter constant operands. Removing the Value * based Args argument enables callers where no Value * operands are available to use the function in a follow-up: computing the scalarization cost directly for a VPlan recipe. It also allows more accurate cost-estimates in the future: for example, when vectorizing a loop, we could also skip operands that are live-ins, as those also do not require scalarization. PR: llvm#154126 (cherry picked from commit 4e6c88b)

Implement computing the scalarization overhead for replicating calls in VPlan, matching the legacy cost model. Depends on llvm#154126. PR: llvm#154291 (cherry picked from commit c3470d1)

Refactor to prepare for llvm#154617. (cherry picked from commit 5e32f72)

…CI). (llvm#154617) Handle intrinsic calls in VPReplicateRecipe::computeCost. There are some intrinsics pseudo intrinsics for which the computed cost is known zero, so we handle those up front. Depends on llvm#154291. PR: llvm#154617 (cherry-picked from df09879)

Check for scalarized calls in needsExtract to fix a divergence between legacy and VPlan-based cost model. The legacy cost model was missing a check for scalarized calls in needsExtract, which meant if incorrectly assumed the result of a scalarized call needs extracting. Exposed by llvm#154617. Fixes llvm#156091. (cherry picked from commit 0aac227)

The second operand when using a safe divisor will always be a select in the loop, so won't be invariant; don't treat it as such. This fixes a divergence with legacy and VPlan based cost model. Fixes llvm#156066. (cherry picked from commit e0f00bd)

Extract the logic to compute the scalarization overhead to a helper for easy re-use in the future. (cherry-picked from commit 30e9cba)

…53218) Avoid calling getLegacyCost for single scalar loads and stores where the cost is trivial to calculate. (cherry picked from commit ba4ce60)

…Cost (NFCI)." This reverts commit 9490d58. Recommits de7e3a5 with a fix for an unhandled case, causing crashes in some configs. (cherry picked from commit 91d4c0d)

getCostForRecipeWithOpcode must only be called with supported opcodes. Directly return the cost, and add llvm_unreachable to catch unhandled cases. (cherry picked from commit fb60d03)

Account for predicated UDiv,SDiv,URem,SRem in VPReplicateRecipe::computeCost: compute costs of extra phis and apply getPredBlockCostDivisor. Fixes llvm#158660 (cherry picked from commit 1858532)

Loads of addresses are scalarized and have their costs computed w/o scalarization overhead. Consistently apply this logic also to non-uniform loads that are already scalarized, to ensure their costs are consistent with other scalarized lodas that are used as addresses. (cherry picked from commit 7dd9b3d)

For UDiv/SDiv with invariant divisors, the created selects will be hoisted out. Don't compute their cost for each iteration, to match the more accurate VPlan-based cost modeling. Fixes llvm#159402. (cherry picked from commit addfdb5)

This ensures each scalarized member has an accurate cost, matching the cost it would have if it would not have been considered for an interleave group. (cherry picked from commit 49605a4)

In some cases, safe-divisor selects can be hoisted out of the vector loop. Catching all cases in the legacy cost model isn't possible, in particular checking if all conditions guarding a division are loop invariant. Instead, check in planContainsAdditionalSimplifications if there are any hoisted safe-divisor selects. If so, don't compare to the more inaccurate legacy cost model. Fixes llvm#160354. Fixes llvm#160356. (cherry picked from commit 88aab08)

If there are direct memory op users of the newly scalarized load, their cost may have changed because there's no scalarization overhead for the operand. Update it. This ensures assigning consistent costs to scalarized memory instructions that themselves have scalarized memory instructions as operands. (cherry picked from commit 1a85027)

…I) (llvm#151487) Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to serve their users requiring a vector, instead of doing so when unrolling by VF. Now we only need to implicitly build vectors in VPTransformState::get for VPInstructions. Once they are also unrolled by VF we can remove the code-path alltogether. PR: llvm#151487 (cherry picked from commit 7e99893)

Move handling of stores to single-scalar/uniform address from replicateByVF to narrowToSingleScalar. (cherry picked from commit 1efa997)

Add test cases for canonicalizing AddRecs that may wrap. (cherry picked from commit e894654)

Add more tests for follow-up to llvm#169576. (cherry picked from commit 41519b3)

…lvm#169576) Extend the {X,+,N}/C => {(X - X%N),+,N}/C canonicalization to handle AddRecs that may wrap, when X < N <= C and both N,C are powers of 2. The alignment and power-of-2 properties ensure division results remain equivalent for all offsets [(X - X%N), X). Alive2 Proof: https://alive2.llvm.org/ce/z/iu2tav Fixes llvm#168709 PR: llvm#169576 (cherry picked from commit 5d87609)

Always add pointers proved to be uniform via legal/SCEV to worklist. This extends the existing logic to handle a few more pointers known to be uniform. (cherry picked from commit 0c028bb)

…omputeCost. (llvm#160053)" (llvm#162157) This reverts commit f80c0ba and 94eade6. Recommit a small fix for targets using prefersVectorizedAddressing. Original message: Update VPReplicateRecipe::computeCost to compute costs of more replicating loads/stores. There are 2 cases that require extra checks to match the legacy cost model: 1. If the pointer is based on an induction, the legacy cost model passes its SCEV to getAddressComputationCost. In those cases, still fall back to the legacy cost. SCEV computations will be added as follow-up 2. If a load is used as part of an address of another load, the legacy cost model skips the scalarization overhead. Those cases are currently handled by a usedByLoadOrStore helper. Note that getScalarizationOverhead also needs updating, because when the legacy cost model computes the scalarization overhead, scalars have not been collected yet, so we can't each for replicating recipes to skip their cost, except other loads. This again can be further improved by modeling inserts/extracts explicitly and consistently, and compute costs for those operations directly where needed. PR: llvm#160053 (cherry picked from commit 74af578)

…lvm#164487) Add a new getGEPExpr variant which is independent of GEPOperator*. To be used to construct SCEVs for VPlan recipes in llvm#161276. PR: llvm#164487 (cherry picked from commit a321ce3)

…lvm#161276) Update getSCEVExprForVPValue to handle more complex expressions, to use it in VPReplicateRecipe::comptueCost. In particular, it supports construction SCEV expressions for GetElementPtr VPReplicateRecipes, with operands that are VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we hit a sub-expression we don't support yet, we return SCEVCouldNotCompute. Note that the SCEV expression is valid VF = 1: we only support construction AddRecs for VPCanonicalIVRecipe, which is an AddRec starting at 0 and stepping by 1. The returned SCEV expressions could be converted to a VF specific one, by rewriting the AddRecs to ones with the appropriate step. Note that the logic for constructing SCEVs for GetElementPtr was directly ported from ScalarEvolution.cpp. Another thing to note is that we construct SCEV expression purely by looking at the operation of the recipe and its translated operands, w/o accessing the underlying IR (the exception being getting the source element type for GEPs). PR: llvm#161276

In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and srem we fall back on the legacy cost unnecessarily. At this point we know that the vplan must be functionally correct, i.e. if the divide/remainder is not safe to speculatively execute then we must have either: 1. Scalarised the operation, in which case we wouldn't be using a VPWidenRecipe, or 2. We've inserted a select for the second operand to ensure we don't fault through divide-by-zero. For 2) it's necessary to add the select operation to VPInstruction::computeCost so that we mirror the cost of the legacy cost model. The only problem with this is that we also generate selects in vplan for predicated loops with reductions, which *aren't* accounted for in the legacy cost model. In order to prevent asserts firing I've also added the selects to precomputeCosts to ensure the legacy costs match the vplan costs for reductions. (cherry picked from commit d606eae)

Extend [Specific]Cmp_match to handle floating-point compares, and introduce m_Cmp that matches both integer and floating-point compares. Use it in simplifyRecipe to match and simplify the general case of compares. The change has necessitated a bugfix in VPReplicateRecipe::execute. (cherry picked from commit 66be00d)

…m#170278) In some cases, the lowering a select depends on the predicate. If the condition of a select is a compare instruction, thread the predicate through to the TTI hook. PR: llvm#170278 (cherry picked from commit 50916a4)

fhahn · 2025-12-09T13:57:48Z

@swift-ci please test

fhahn and others added 30 commits December 9, 2025 13:39

[VPlan] Compute cost single-scalar calls in computeCost. (NFC)

b826b60

Compute the cost of non-intrinsic, single-scalar calls directly in VPReplicateRecipe::computeCost. This starts moving call cost computations to VPlan, handling the simplest case first. (Cherry-picked from 79be94c)

[VPlan] Compute cost of replicating calls in VPlan. (NFCI) (llvm#154291)

c7069ab

Implement computing the scalarization overhead for replicating calls in VPlan, matching the legacy cost model. Depends on llvm#154126. PR: llvm#154291 (cherry picked from commit c3470d1)

[VPlan] Move logic to compute cost for intrinsic to helper (NFC).

dde0c0e

Refactor to prepare for llvm#154617. (cherry picked from commit 5e32f72)

[VPlan] Move logic to compute scalarization overhead to cost helper(NFC)

6c8e178

Extract the logic to compute the scalarization overhead to a helper for easy re-use in the future. (cherry-picked from commit 30e9cba)

[LV] Add scalar load/stores to VPReplicateRecipe::computeCost (llvm#1…

ccd9f3e

…53218) Avoid calling getLegacyCost for single scalar loads and stores where the cost is trivial to calculate. (cherry picked from commit ba4ce60)

Reapply "[VPlan] Compute cost of scalar (U|S)Div, (U|S)Rem in compute…

caef924

…Cost (NFCI)." This reverts commit 9490d58. Recommits de7e3a5 with a fix for an unhandled case, causing crashes in some configs. (cherry picked from commit 91d4c0d)

[VPlan] Return non-option cost from getCostForRecipeWithOpcode (NFC).

65d1d76

getCostForRecipeWithOpcode must only be called with supported opcodes. Directly return the cost, and add llvm_unreachable to catch unhandled cases. (cherry picked from commit fb60d03)

[VPlan] Handle predicated UDiv in VPReplicateRecipe::computeCost.

8eb41eb

Account for predicated UDiv,SDiv,URem,SRem in VPReplicateRecipe::computeCost: compute costs of extra phis and apply getPredBlockCostDivisor. Fixes llvm#158660 (cherry picked from commit 1858532)

[LV] Set correct costs for interleave group members.

16c0f7b

This ensures each scalarized member has an accurate cost, matching the cost it would have if it would not have been considered for an interleave group. (cherry picked from commit 49605a4)

[VPlan] Handle stores to single-scalar addr in narrowToSingleScalars.

80d8bee

Move handling of stores to single-scalar/uniform address from replicateByVF to narrowToSingleScalar. (cherry picked from commit 1efa997)

[SCEV] Add tests for UDiv canonicalization of AddRecs that may wrap.

422130e

Add test cases for canonicalizing AddRecs that may wrap. (cherry picked from commit e894654)

[SCEV] Add UDiv canonicalization tests with nested AddRecs.

980898d

Add more tests for follow-up to llvm#169576. (cherry picked from commit 41519b3)

[LV] Always add uniform pointers to uniforms list.

de924ba

Always add pointers proved to be uniform via legal/SCEV to worklist. This extends the existing logic to handle a few more pointers known to be uniform. (cherry picked from commit 0c028bb)

[SCEV] Expose getGEPExpr without needing to pass GEPOperator* (NFC) (l…

37af84e

…lvm#164487) Add a new getGEPExpr variant which is independent of GEPOperator*. To be used to construct SCEVs for VPlan recipes in llvm#161276. PR: llvm#164487 (cherry picked from commit a321ce3)

david-arm and others added 3 commits December 9, 2025 13:53

fhahn requested a review from a team as a code owner December 9, 2025 13:57

fhahn merged commit cff3418 into swiftlang:stable/21.x Dec 10, 2025
3 checks passed

fhahn deleted the pick-vplan-cost-improvements branch December 10, 2025 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Pick VPlan cost improvements #11963

[LV] Pick VPlan cost improvements #11963

Uh oh!

fhahn commented Dec 9, 2025

Uh oh!

fhahn commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[LV] Pick VPlan cost improvements #11963

[LV] Pick VPlan cost improvements #11963

Uh oh!

Conversation

fhahn commented Dec 9, 2025

Uh oh!

fhahn commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants