Skip to content

Conversation

@fhahn
Copy link

@fhahn fhahn commented Dec 9, 2025

rdar://163931465

fhahn and others added 30 commits December 9, 2025 13:39
Compute the cost of non-intrinsic, single-scalar calls directly in
VPReplicateRecipe::computeCost.

This starts moving call cost computations to VPlan, handling the
simplest case first.

(Cherry-picked from 79be94c)
…lvm#153361)

A number of recipes compute costs for the same opcodes for scalars or
vectors, depending on the recipe.

Move the common logic out to a helper in VPRecipeWithIRFlags, that is
then used by VPReplicateRecipe, VPWidenRecipe and VPInstruction.

This makes it easier to cover all relevant opcodes, without duplication.

PR: llvm#153361
(cherry picked from commit 35be64a)
…(NFC)

We iterate over InstsToScalarize when printing costs, and currently the
iteration order is not deterministic. Currently no tests check the
output with multiple instructions in InstsToScalarize, but those will
come soon.

(cherry picked from commit c300a99)
We iterate over the scalar costs of instruction when printing costs, and
currently the iteration order is not deterministic. Currently no tests
check the output with multiple instructions in the map, but those will
come soon.
…lvm#151940)

We need to reject plans that contain recipes with invalid costs. LICM
can move recipes with invalid costs out of the loop region, which then
get missed by the main cost computation.

Extend the logic to check recipes for invalid cost currently only
covering the middle block to include all skeleton blocks.

Fixes llvm#144358
Fixes llvm#151664

PR: llvm#151940
(cherry picked from commit 95c32bf)
…). (llvm#154126)

Remove the ArrayRef<const Value*> Args operand from
getOperandsScalarizationOverhead and require that the callers
de-duplicate arguments and filter constant operands.

Removing the Value * based Args argument enables callers where no Value
* operands are available to use the function in a follow-up: computing
the scalarization cost directly for a VPlan recipe.

It also allows more accurate cost-estimates in the future: for example,
when vectorizing a loop, we could also skip operands that are live-ins,
as those also do not require scalarization.

PR: llvm#154126
(cherry picked from commit 4e6c88b)
Implement computing the scalarization overhead for replicating calls in
VPlan, matching the legacy cost model.

Depends on llvm#154126.

PR: llvm#154291
(cherry picked from commit c3470d1)
Refactor to prepare for llvm#154617.

(cherry picked from commit 5e32f72)
…CI). (llvm#154617)

Handle intrinsic calls in VPReplicateRecipe::computeCost. There are some
intrinsics pseudo intrinsics for which the computed cost is known zero,
so we handle those up front.

Depends on llvm#154291.

PR: llvm#154617

(cherry-picked from df09879)
Check for scalarized calls in needsExtract to fix a divergence between
legacy and VPlan-based cost model.

The legacy cost model was missing a check for scalarized calls in
needsExtract, which meant if incorrectly assumed the result of a
scalarized call needs extracting.

Exposed by llvm#154617.

Fixes llvm#156091.

(cherry picked from commit 0aac227)
The second operand when using a safe divisor will always be a select in
the loop, so won't be invariant; don't treat it as such.

This fixes a divergence with legacy and VPlan based cost model.

Fixes llvm#156066.

(cherry picked from commit e0f00bd)
Extract the logic to compute the scalarization overhead to a helper for
easy re-use in the future.

(cherry-picked from commit 30e9cba)
…53218)

Avoid calling getLegacyCost for single scalar loads and stores where the
cost is trivial to calculate.

(cherry picked from commit ba4ce60)
…Cost (NFCI)."

This reverts commit 9490d58.

Recommits de7e3a5 with a fix for an unhandled case, causing crashes
in some configs.

(cherry picked from commit 91d4c0d)
getCostForRecipeWithOpcode must only be called with supported opcodes.
Directly return the cost, and add llvm_unreachable to catch unhandled
cases.

(cherry picked from commit fb60d03)
Account for predicated UDiv,SDiv,URem,SRem in
VPReplicateRecipe::computeCost: compute costs of extra phis and apply
getPredBlockCostDivisor.

Fixes llvm#158660

(cherry picked from commit 1858532)
Loads of addresses are scalarized and have their costs computed w/o
scalarization overhead. Consistently apply this logic also to
non-uniform loads that are already scalarized, to ensure their costs are
consistent with other scalarized lodas that are used as addresses.

(cherry picked from commit 7dd9b3d)
For UDiv/SDiv with invariant divisors, the created selects will be
hoisted out. Don't compute their cost for each iteration, to match the
more accurate VPlan-based cost modeling.

Fixes llvm#159402.

(cherry picked from commit addfdb5)
This ensures each scalarized member has an accurate cost, matching the
cost it would have if it would not have been considered for an
interleave group.

(cherry picked from commit 49605a4)
In some cases, safe-divisor selects can be hoisted out of the vector
loop. Catching all cases in the legacy cost model isn't possible, in
particular checking if all conditions guarding a division are loop
invariant.

Instead, check in planContainsAdditionalSimplifications if there are any
hoisted safe-divisor selects. If so, don't compare to the more
inaccurate legacy cost model.

Fixes llvm#160354.
Fixes llvm#160356.

(cherry picked from commit 88aab08)
If there are direct memory op users of the newly scalarized load,
their cost may have changed because there's no scalarization
overhead for the operand. Update it.

This ensures assigning consistent costs to scalarized memory
instructions that themselves have scalarized memory instructions as
operands.

(cherry picked from commit 1a85027)
…I) (llvm#151487)

Materialze Build(Struct)Vectors explicitly for VPRecplicateRecipes, to
serve their users requiring a vector, instead of doing so when unrolling
by VF.

Now we only need to implicitly build vectors in VPTransformState::get
for VPInstructions. Once they are also unrolled by VF we can remove the
code-path alltogether.

PR: llvm#151487
(cherry picked from commit 7e99893)
Move handling of stores to single-scalar/uniform address from
replicateByVF to narrowToSingleScalar.

(cherry picked from commit 1efa997)
Add test cases for canonicalizing AddRecs that may wrap.

(cherry picked from commit e894654)
Add more tests for follow-up to
llvm#169576.

(cherry picked from commit 41519b3)
…lvm#169576)

Extend the {X,+,N}/C => {(X - X%N),+,N}/C canonicalization to handle
AddRecs that may wrap, when X < N <= C and both N,C are powers of 2. The
alignment and power-of-2 properties ensure division results remain
equivalent for all offsets [(X - X%N), X).

Alive2 Proof: https://alive2.llvm.org/ce/z/iu2tav

Fixes llvm#168709

PR: llvm#169576
(cherry picked from commit 5d87609)
Always add pointers proved to be uniform via legal/SCEV to worklist.
This extends the existing logic to handle a few more pointers known to
be uniform.

(cherry picked from commit 0c028bb)
…omputeCost. (llvm#160053)" (llvm#162157)

This reverts commit f80c0ba and 94eade6.

Recommit a small fix for targets using prefersVectorizedAddressing.

Original message:
Update VPReplicateRecipe::computeCost to compute costs of more
replicating loads/stores.

There are 2 cases that require extra checks to match the legacy cost
model:
1. If the pointer is based on an induction, the legacy cost model passes
its SCEV to getAddressComputationCost. In those cases, still fall back
to the legacy cost. SCEV computations will be added as follow-up
2. If a load is used as part of an address of another load, the legacy
cost model skips the scalarization overhead. Those cases are currently
handled by a usedByLoadOrStore helper.

Note that getScalarizationOverhead also needs updating, because when the
legacy cost model computes the scalarization overhead, scalars have not
been collected yet, so we can't each for replicating recipes to skip
their cost, except other loads. This again can be further improved by
modeling inserts/extracts explicitly and consistently, and compute costs
for those operations directly where needed.

PR: llvm#160053
(cherry picked from commit 74af578)
…lvm#164487)

Add a new getGEPExpr variant which is independent of GEPOperator*.

To be used to construct SCEVs for VPlan recipes in
llvm#161276.

PR: llvm#164487
(cherry picked from commit a321ce3)
…lvm#161276)

Update getSCEVExprForVPValue to handle more complex expressions, to use
it in VPReplicateRecipe::comptueCost.

In particular, it supports construction SCEV expressions for
GetElementPtr VPReplicateRecipes, with operands that are
VPScalarIVStepsRecipe, VPDerivedIVRecipe and VPCanonicalIVRecipe. If we
hit a sub-expression we don't support yet, we return
SCEVCouldNotCompute.

Note that the SCEV expression is valid VF = 1: we only support
construction AddRecs for VPCanonicalIVRecipe, which is an AddRec
starting at 0 and stepping by 1. The returned SCEV expressions could be
converted to a VF specific one, by rewriting the AddRecs to ones with
the appropriate step.

Note that the logic for constructing SCEVs for GetElementPtr was
directly ported from ScalarEvolution.cpp.

Another thing to note is that we construct SCEV expression purely by
looking at the operation of the recipe and its translated operands, w/o
accessing the underlying IR (the exception being getting the source
element type for GEPs).

PR: llvm#161276
david-arm and others added 3 commits December 9, 2025 13:53
In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and
srem we fall back on the legacy cost unnecessarily. At this point we
know that the vplan must be functionally correct, i.e. if the
divide/remainder is not safe to speculatively execute then we must have
either:

1. Scalarised the operation, in which case we wouldn't be using a
VPWidenRecipe, or
2. We've inserted a select for the second operand to ensure we don't
fault through divide-by-zero.

For 2) it's necessary to add the select operation to
VPInstruction::computeCost so that we mirror the cost of the legacy cost
model. The only problem with this is that we also generate selects in
vplan for predicated loops with reductions, which *aren't* accounted for
in the legacy cost model. In order to prevent asserts firing I've also
added the selects to precomputeCosts to ensure the legacy costs match
the vplan costs for reductions.

(cherry picked from commit d606eae)
Extend [Specific]Cmp_match to handle floating-point compares, and
introduce m_Cmp that matches both integer and floating-point compares.
Use it in simplifyRecipe to match and simplify the general case of
compares. The change has necessitated a bugfix in
VPReplicateRecipe::execute.

(cherry picked from commit 66be00d)
…m#170278)

In some cases, the lowering a select depends on the predicate. If the
condition of a select is a compare instruction, thread the predicate
through to the TTI hook.

PR: llvm#170278
(cherry picked from commit 50916a4)
@fhahn fhahn requested a review from a team as a code owner December 9, 2025 13:57
@fhahn
Copy link
Author

fhahn commented Dec 9, 2025

@swift-ci please test

@fhahn fhahn merged commit cff3418 into swiftlang:stable/21.x Dec 10, 2025
3 checks passed
@fhahn fhahn deleted the pick-vplan-cost-improvements branch December 10, 2025 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants