Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support resolving dispatch in apply_iterate for variable-length containers (Tuple{Vararg{T}} and Vector{T}) #57830

Open
topolarity opened this issue Mar 19, 2025 · 8 comments
Assignees
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) trimming Issues with trimming functionality or PR's relevant to its performance/functionality

Comments

@topolarity
Copy link
Member

topolarity commented Mar 19, 2025

Right now, --trim has a very hard time compiling code like this:

function splatme(@nospecialize(tup::NTuple{N, Int} where N)) 
    return [tup...]
end

As code_typed helpfully points out, this has a dynamic call even after optimizations:

julia> code_typed(splatme, (NTuple{N,Int} where N,))
1-element Vector{Any}:
 CodeInfo(
1%1 = dynamic builtin Core._apply_iterate(Base.iterate, Base.vect, x)::Union{Vector{Any}, Vector{Int64}}
└──      return %1
) => Union{Vector{Any}, Vector{Int64}}

This is a bit unfortunate, since inference is pretty good at exploring the calls that this _apply_iterate will actually perform. The problem is that Tuple is not allowed to have an unknown length for inlining to transform this to a "dispatch-resolved" form.

Similarly, we have a very hard time with:

julia> concat(a::Vector{Int}, b::Vector{Int}) = [a..., b...]
julia> code_typed(concat, (Vector{Int}, Vector{Int}))
1-element Vector{Any}:
 CodeInfo(
1%1 = dynamic builtin Core._apply_iterate(Base.iterate, Base.vect, a, b)::Union{Vector{Any}, Vector{Int64}}
└──      return %1
) => Union{Vector{Any}, Vector{Int64}}

This should be a very simple operation, but the variable-length again makes inlining inapplicable.

Finally, there are cases that we support just fine, but only because we "cheat" the dispatch:

julia> unsplatme(v::Vector{Int}) = (v...,)
julia> code_typed(unsplatme, (Vector{Int},))
1-element Vector{Any}:
 CodeInfo(
1%1 =   builtin Core._apply_iterate(Base.iterate, Core.tuple, v)::Tuple{Vararg{Int64}}
└──      return %1
) => Tuple{Vararg{Int64}}

This only works in --trim because of somewhat embarrassing hard-coded cases in builtins.c that don't respect user overloads / dispatch semantics.

I suspect the solution might be to introduce a "dispatch-resolved" version of Core._apply_iterate. However, it might also be possible to relax the restrictions in the inlining pass, if there is a legal transform possible with the existing "dispatch-resolved" primitives, i.e., Expr(:invoke, ...).

Either way, we'd like to find a way to have the optimizer resolve all these dispatches, instead of leaving them un-annotated and hard-coding some of their results.

@topolarity topolarity added trimming Issues with trimming functionality or PR's relevant to its performance/functionality compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Mar 19, 2025
@serenity4
Copy link
Contributor

Thanks for the detailed description! I'll take a look at this.

@topolarity topolarity changed the title Support inlining apply_iterate for variable-length containers (Tuple{Vararg{T}} and Vector{T}) Support resolving dispatch in apply_iterate for variable-length containers (Tuple{Vararg{T}} and Vector{T}) Mar 19, 2025
@serenity4
Copy link
Contributor

serenity4 commented Mar 19, 2025

Why may we annotate a built-in function as a dynamic call in the first place? Doesn't it always call the same built-in function (albeit with a possibly unknown signature) since built-ins cannot be extended?

It seems to me like the dynamic annotation implies that we may eventually have a dynamic dispatch at runtime (to iterate or the apply target function, say f), or that we may error at runtime if arguments are incorrect, not that the dispatch itself to Core._apply_iterate is dynamic. Is this correct?

@topolarity
Copy link
Member Author

Yes, that's all correct 👍

The dynamic part refers to the fact that the built-in may dynamically invoke user code. Core._apply / Core.invokelatest are two quintessential examples since they immediately call user code by definition. Intrinsics may also do this, such as Core.Intrinsics.atomic_pointermodify

Also FWIW, erroring at runtime does not count as dynamic, provided that the runtime can construct and throw the error without invoking user code.

@topolarity
Copy link
Member Author

BTW even if the dispatch to the built-in were dynamic, we wouldn't care since the runtime is guaranteed to provide all built-in implementations. The issue is that we lose the dispatch edge into user code, which we try to aggressively trim

dynamic should really be glossed as unbounded invocation of user code

@Keno
Copy link
Member

Keno commented Mar 20, 2025

Could have Core._invoke_iterate that takes a CodeInstance.

@serenity4
Copy link
Contributor

Observations

As I understand it, for SimpleVector, Tuple, NamedTuple, GenericMemory and Array iterables:

  • Iteration is static (happens hardcoded in the built-in).
  • The apply function dispatch is dynamic if the number of elements (or their type) is unknown.

while for the case of a single iterable, specific apply targets/iterable type combinations (only involving SimpleVector and Tuple apply targets) have iteration and the apply function call statically combined, also hardcoded in the built-in.

The fact that iteration is hardcoded in the built-in is a potential source of bugs:

julia> struct MyType end

julia> Base.iterate(::Vector{MyType}) = nothing # don't iterate the vector

julia> x = MyType()
MyType()

julia> args = MyType[]; for el in [x, x] push!(args, el) end; (args...,)
()

julia> f() = 1
f (generic function with 1 method)

julia> f(a, b) = 2
f (generic function with 2 methods)

julia> f([x, x]...) # should call `f()` but it calls `f(x, x)` instead
2

Conceptually we have two function calls per iterable argument:

And a final function call:

Possible design

The most straightforward case is to have a single iterable argument. Following the argument convention of Core._apply_iterate we could provide a Core._invoke_apply_iterate(iterate1::CodeInstance, iterate2::CodeInstance, f::CodeInstance, itr) built-in that accepts the two iterate CodeInstances, the CodeInstance for the function to apply, and the iterable.

To support the multi-argument case, we may instead define it as Core._invoke_apply_iterate(f::CodeInstance, iterate1_x::CodeInstance, iterate2_x::CodeInstance, itr_x, iterate1_y::CodeInstance, iterate2_y::CodeInstance, itr_y, ...). We could also allow part of the function arguments to be unresolved as generic functions instead of a known CodeInstance.

That would allow us to remove all hardcoded cases in favor of explicit (and optionally resolved) callables. _apply_iterate could then retain its generic fallback as its only implementation where we would rely on inference to swap it out for _invoke_apply_iterate for resolved dispatches.

This would be more correct in the sense of respecting user overloads and would address the current issue, however I'm not sure about the performance implications (notably in the interpreted case if we remove the hardcoded cases where we can't resolve calls to a CodeInstance).

Any thoughts/ideas regarding this design (or alternative proposals)?

@topolarity
Copy link
Member Author

That sounds OK to me, although it does restrict us to perfectly type-stable (and well-inferred) iterate implementations

My only argument for an alternative is that if we "just inlined" this to an equivalent implementation (adding loops, etc.) we'd have more freedom to perform union-splitting, etc. for the various iterate calls, like we normally do

@serenity4
Copy link
Contributor

serenity4 commented Mar 20, 2025

There are two other possibilities also perhaps worth mentioning:

  • Keep iteration hardcoded (at least for now), focus on supporting a CodeInstance argument for the applied function only (when it can be statically resolved).
  • Attempt to rewrite _apply_iterate in Julia (so we rely on standard inference to resolve all function calls)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) trimming Issues with trimming functionality or PR's relevant to its performance/functionality
Projects
None yet
Development

No branches or pull requests

3 participants