-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
adds the nth
function for iterables
#56580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
base/iterators.jl
Outdated
""" | ||
nth(itr, n::Integer) | ||
|
||
Get the `n`th element of an iterable collection. Return `nothing` if not existing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning nothing
makes it impossible to distinguish between "the nth element was nothing
", and "there was no nth element". Perhaps return Union{Nothing, Some}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.
Should it be Union{nothing, Some}
even in those cases where we know there can't be a nothing
value in the iterator (for sake of uniform api)? I.e. Count
Iterator or Repeated
(with its element different than nothing) or AbstractRanges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should, otherwise it would be too confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just throw an error if there is no n
th element. There could also be a default
argument as in get
, where a user can pass a value that should be returned if no n
th element exists.
I don't really follow the logic that the spirit of iterators is to return nothing
in such cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree nothing
is weird, your iterator can produce that. Some
seems a bit technical & unfriendly? An error seems fine. Matches what first([])
does.
I suppose it can't literally be a method of get
since it goes by enumeration not keys:
julia> first(Dict('a':'z' .=> 'A':'Z'), 3)
3-element Vector{Pair{Char, Char}}:
'n' => 'N'
'f' => 'F'
'w' => 'W'
julia> nth(Dict('a':'z' .=> 'A':'Z'), 3)
'w' => 'W'
how would this compare to a more naive implementation like
? |
|
Seems like a lot of code. I reproduced the above benchmark here: No strong position on whether this needs a name or not, but perhaps this first PR can focus on that, and let the implementation be just: nth(itr, n::Integer) = first(Iterators.drop(itr, n-1))
nth(itr::AbstractArray, n::Integer) = itr[begin-1+n] |
A lot of the code is for optimizing out of bound checking. If we go with davidantoff suggestion of letting |
I disagree with throwing an error. In cases where you don't know if an nth element exists, that forces a try-catch which is both slow and brittle. I would imagine that most ordered iterators with a known length support indexing, so this would probably mostly be used precisely when the length is unknown. |
I think another consideration here is consistency: the other functions we have that take an individual element from an iterator are I agree with @jakobnissen that in some situations being able to handle this without an exception would be nice, but on the flip side, I can also see scenarios where an error seems much better, in particular in interactive sessions where I might be playing around with some data and this function could be very useful. And especially in an interactive scenario it would be super inconvenient if Maybe the best design would be to allow for both scenarios. Say something like nth(itr, n, nothrow=false) So the default would be that an exception is thrown if the |
We could also opt for relying on the
Although I see the similarity with
the error in lastindex(a::AbstractArray) = (@inline; last(eachindex(IndexLinear(), a))) # equals to last(OneTo(0)) Similarly, both From this my idea that in principle iterators are non throwing by default, any throwing should be done one level higher and not at the iterator level itself (like how |
I have to admit, I think that is the option I like least of all of the proposed options so far :) It would make it very tricky to write generic code that uses the
To me
Agreed, but the whole difference between I still think that my proposal with an argument like |
Is there any precedent for a We could also follow |
I think it's already hard to write generic code that covers both generic collections and
Not really, I don't have particularly hard opinions about it. In the original issue I had proposed something similar with
My proposal for |
throwing together some "PR litterature review" for cross reference since I think this PR can depend/interact on/with these:
EDIT: |
Having thought about it, I do have some sympathy for the argument of @davidanthoff that it should behave like I do see myself wanting to use it in code like: fourth_field = @something nth(eachsplit(line, '\t'), 4) throw(FormatError(
lazy"Line $lineno does not contain four tab-separated fields fields"
)) Which would now instead be fourth_field = first(@something iterate(drop(eachsplit(line, '\t'), 3)) throw(FormatError(
lazy"Line $lineno does not contain four tab-separated fields fields"
))) That's certainly doable (especially since, for iterators of unknown length, most of the clever tricks that |
What is the semantic difference between this function and |
matching |
Yes, agreed! Having two distinct functions probably also helps with type stability. Another naming scheme I thought about is |
Julia has a bunch of patterns for handling this already, so one has some freedom to choose "consistent with what?" :) |
I see 4 ways of handling errors in Julia:
Personally I'd be happy with Base having both |
The 5th option is Union{T,S} where you supply S -- like
It takes the iteration count, not the index. (Same on Vector, different on Dict, or OffsetArray.)
That's one's not as bad, as it's either an index or |
I see, thanks.
The problem with "just assume users do what you expect" is that (1) nobody ever documents what they expect and (2) even documented it increases the complexity of usage. No library function using
|
Might be nice to accept an |
But I agree with |
Nonscalar indexing |
I presume the point of How well |
I've changed the code to be decoupled from function _nth(itr, n)
# unrolled version of `first(drop)`
n > 0 || throw(ArgumentError("n must be positive"))
y = iterate(itr)
for i in 1:n-1
y === nothing && break
y = iterate(itr, y[2])
end
y === nothing && throw(BoundsError(itr, n))
y[1]
end some benefits are: slightly faster than bench from the testset[ Info: first(drop) / unrolled
[ Info: Int64
3.6 ns / 3.0 ns 1.19x n/N = 1//1
[ Info: Base.Generator{UnitRange{Int64}, var"#25#26"}
7.0 ns / 5.5 ns 1.28x n/N = 1//1
[ Info: SubArray{Int64, 2, Base.ReshapedArray{Int64, 2, UnitRange{Int64}, Tuple{}}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}
2.8 ns / 2.8 ns 1.00x n/N = 1//1
[ Info: Pair{Int64, Int64}
4.0 ns / 3.3 ns 1.20x n/N = 1//1
[ Info: Cycle{Vector{Int64}}
9208.0 ns / 3.0 ns 3026.96x n/N = 1//1
[ Info: Take{Repeated{Float64}}
4.0 ns / 4.3 ns 0.92x n/N = 1//1
[ Info: Vector{Int64}
2.1 ns / 2.1 ns 0.98x n/N = 1//1
[ Info: Base.Iterators.ProductIterator{Tuple{UnitRange{Int64}, UnitRange{Int64}}}
7.4 ns / 4.9 ns 1.50x n/N = 1//1
[ Info: @NamedTuple{a::Int64, b::Int64, c::Int64, d::Int64, e::Int64}
4.6 ns / 4.6 ns 1.00x n/N = 1//1
[ Info: NTuple{5, Int64}
4.0 ns / 3.3 ns 1.19x n/N = 1//1
[ Info: Char
3.7 ns / 3.0 ns 1.21x n/N = 1//1
[ Info: Base.ReshapedArray{Int64, 2, UnitRange{Int64}, Tuple{}}
2.4 ns / 2.4 ns 1.00x n/N = 1//1
[ Info: StepRange{Int64, Int64}
3.7 ns / 3.7 ns 1.00x n/N = 1//1
[ Info: Base.Pairs{Int64, Int64, LinearIndices{1, Tuple{Base.OneTo{Int64}}}, UnitRange{Int64}}
10.6 ns / 6.8 ns 1.56x n/N = 1//1
[ Info: SubArray{Int64, 0, Array{Int64, 0}, Tuple{}, true}
2.1 ns / 2.1 ns 1.00x n/N = 1//1
[ Info: Flatten{Take{Repeated{Vector{Int64}}}}
106.5 ns / 3.0 ns 35.01x n/N = 1//1
[ Info: Flatten{Tuple{UnitRange{Int64}, UnitRange{Int64}}}
20.7 ns / 17.6 ns 1.18x n/N = 1//1
[ Info: Base.Iterators.Zip{Tuple{UnitRange{Int64}, UnitRange{Int64}, UnitRange{Int64}}}
16.1 ns / 6.1 ns 2.62x n/N = 1//1
[ Info: String
10.8 ns / 6.1 ns 1.76x n/N = 1//1
[ Info: Bool
3.7 ns / 3.1 ns 1.19x n/N = 1//1
[ Info: Flatten{Take{Repeated{String}}}
68.8 ns / 11.4 ns 6.02x n/N = 1//1
[ Info: Cycle{Tuple{Tuple{}}}
3.0 ns / 3.3 ns 0.91x n/N = 1//1
[ Info: Base.Iterators.Filter{typeof(isodd), UnitRange{Int64}}
8.9 ns / 6.5 ns 1.38x n/N = 1//1 uniform errors: now everything is a This makes it also consistent with the |
change negative indices to be boundserrors to match array's behaviours fix docstrings tests
I'm sorry to suggest yet another modification as I know this PR has seen a fair bit of back and forth, but I wonder if it wouldn't be a bit cleaner structured something like this
to use dispatch and avoid that |
No problem, it's better to get this things right the first time around than afterwards. It's also my fault for not staying on top of it more, but too many things fighting for my free time lately. I remember trying the holy trait style in one of the first iterations. Don't recall exactly why I moved away from it in the end. I'll try again here since the point you make about nested specializations is a very good one. |
I've given it a go and it works quite well, witout performance penalties.
since there is the issue that The second specialization for finite cycles also currently returns we could really use an |
woops good point w.r.t. the last point I'll make is that these methods for granted, it's a bit exotic to be handed a cycle of a stateful iterator with length, so maybe that's ok.... but just something to keep in mind. it's also an existing bug here #43235 so IMO I'd say it's not PR blocking ideally, something along the lines of #43388 would merge and then the fast paths would simply check this trait as well. |
Yeah I was about to say that currently cycles of stateful iterators do not work to begin with, but you raise a valid point nonetheless. |
… union splitting.
personally I think it's good as-is and I wouldn't add more complexity to paper over bugs caused elsewhere I would just remove the link in the docstring here https://github.com/JuliaLang/julia/pull/56580/files#r2147386533 (otherwise seeing as
I'll tag this to be merged once it passes CI (including the blocking NEWS label) |
remove self reference in docstrings Co-authored-by: Andy Dienes <[email protected]>
Co-authored-by: Andy Dienes <[email protected]>
Errors seems unrelated to this PR. |
Hi,
I've turned the open ended issue #54454 into an actual PR.
Tangentially related to #10092 ?
This PR introduces the
nth(itr, n)
function to iterators to give agetindex
type of behaviour.I've tried my best to optimize as much as possible by specializing on different types of iterators.
In the spirit of iterators any OOB access returns
nothing
. (edit: instead of throwing an error, i.e.first(itr, n)
andlast(itr, n)
)here is the comparison of running the testsuite (~22 different iterators) using generic
nth
and specializednth
: