Description
Right now, --trim
has a very hard time compiling code like this:
function splatme(@nospecialize(tup::NTuple{N, Int} where N))
return [tup...]
end
As code_typed
helpfully points out, this has a dynamic call even after optimizations:
julia> code_typed(splatme, (NTuple{N,Int} where N,))
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = dynamic builtin Core._apply_iterate(Base.iterate, Base.vect, x)::Union{Vector{Any}, Vector{Int64}}
└── return %1
) => Union{Vector{Any}, Vector{Int64}}
This is a bit unfortunate, since inference is pretty good at exploring the calls that this _apply_iterate
will actually perform. The problem is that Tuple
is not allowed to have an unknown length for inlining
to transform this to a "dispatch-resolved" form.
Similarly, we have a very hard time with:
julia> concat(a::Vector{Int}, b::Vector{Int}) = [a..., b...]
julia> code_typed(concat, (Vector{Int}, Vector{Int}))
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = dynamic builtin Core._apply_iterate(Base.iterate, Base.vect, a, b)::Union{Vector{Any}, Vector{Int64}}
└── return %1
) => Union{Vector{Any}, Vector{Int64}}
This should be a very simple operation, but the variable-length again makes inlining
inapplicable.
Finally, there are cases that we support just fine, but only because we "cheat" the dispatch:
julia> unsplatme(v::Vector{Int}) = (v...,)
julia> code_typed(unsplatme, (Vector{Int},))
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = builtin Core._apply_iterate(Base.iterate, Core.tuple, v)::Tuple{Vararg{Int64}}
└── return %1
) => Tuple{Vararg{Int64}}
This only works in --trim
because of somewhat embarrassing hard-coded cases in builtins.c
that don't respect user overloads / dispatch semantics.
I suspect the solution might be to introduce a "dispatch-resolved" version of Core._apply_iterate
. However, it might also be possible to relax the restrictions in the inlining
pass, if there is a legal transform possible with the existing "dispatch-resolved" primitives, i.e., Expr(:invoke, ...)
.
Either way, we'd like to find a way to have the optimizer resolve all these dispatches, instead of leaving them un-annotated and hard-coding some of their results.