Skip to content

adds the nth function for iterables #56580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

ghyatzo
Copy link
Contributor

@ghyatzo ghyatzo commented Nov 16, 2024

Hi,

I've turned the open ended issue #54454 into an actual PR.
Tangentially related to #10092 ?

This PR introduces the nth(itr, n) function to iterators to give a getindex type of behaviour.
I've tried my best to optimize as much as possible by specializing on different types of iterators.
In the spirit of iterators any OOB access returns nothing. (edit: instead of throwing an error, i.e. first(itr, n) and last(itr, n))

here is the comparison of running the testsuite (~22 different iterators) using generic nth and specialized nth:

@btime begin                                                                                                                                                                                                                     
    for (itr, n, _) in $testset                                                                                                                                                                                           
         _fallback_nth(itr, n)                                                                                                                                                                                                           
    end                                                                                                                                                                                                                          
end                                                                                                                                                                                                                              
117.750 μs (366 allocations: 17.88 KiB)

@btime begin                                                                                                                                                                                                                     
  for (itr, n, _) in $testset                                                                                                                                                                                           
    nth(itr, n)                                                                                                                                                                                                              
  end                                                                                                                                                                                                                          
end                                                                                                                                                                                                                              
24.250 μs (341 allocations: 16.70 KiB)

"""
nth(itr, n::Integer)

Get the `n`th element of an iterable collection. Return `nothing` if not existing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning nothing makes it impossible to distinguish between "the nth element was nothing", and "there was no nth element". Perhaps return Union{Nothing, Some}?

Copy link
Contributor Author

@ghyatzo ghyatzo Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.
Should it be Union{nothing, Some} even in those cases where we know there can't be a nothing value in the iterator (for sake of uniform api)? I.e. Count Iterator or Repeated (with its element different than nothing) or AbstractRanges

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should, otherwise it would be too confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just throw an error if there is no nth element. There could also be a default argument as in get, where a user can pass a value that should be returned if no nth element exists.

I don't really follow the logic that the spirit of iterators is to return nothing in such cases?

Copy link
Contributor

@mcabbott mcabbott Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree nothing is weird, your iterator can produce that. Some seems a bit technical & unfriendly? An error seems fine. Matches what first([]) does.

I suppose it can't literally be a method of get since it goes by enumeration not keys:

julia> first(Dict('a':'z' .=> 'A':'Z'), 3)
3-element Vector{Pair{Char, Char}}:
 'n' => 'N'
 'f' => 'F'
 'w' => 'W'

julia> nth(Dict('a':'z' .=> 'A':'Z'), 3)
'w' => 'W'

```
"""
nth(itr, n::Integer) = _nth(IteratorSize(itr), itr, n)
nth(itr::AbstractArray, n::Integer) = n > length(itr) ? nothing : itr[n]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes one-based indexing. Perhaps do itr[begin + n - 1].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are absolutely correct.
would something like getindex(itr, nth(eachindex(IndexLinear(), itr), n)) be too overkill?
and adding a specialization with nth(itr::AbstractRange, n::Integer) = getindex(itr, n)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the probably overkill approach, if it's too much i'll revert back to your suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractRanges are not always one-based either, so that approach runs into the same issue

Copy link
Contributor Author

@ghyatzo ghyatzo Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I could gather that is included in the getindex already, since it ends up calling

unsafe_getindex(v::AbstractRange{T}, i::Integer) where T = convert(T, first(v) + (i - oneunit(i))*step_hp(v))

which should pretty much be the same sa [begin + n -1]
unless I'm missing the point completely?

Copy link
Contributor

@mcabbott mcabbott Nov 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line nth(itr::AbstractRange, n) = getindex(itr, n) will for sure fail on the axes of an OffsetArray. (In fact, it will first be ambiguous, as n::Any is less specific.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was overthinking it, I'll just stick with [begin + n - 1]. Sorry.

@adienes
Copy link
Member

adienes commented Nov 16, 2024

how would this compare to a more naive implementation like

nth(itr, n) = first(Iterators.Rest(itr, n))

?

@LilithHafner LilithHafner added triage This should be discussed on a triage call iteration Involves iteration or the iteration protocol feature Indicates new feature / enhancement requests labels Nov 16, 2024
@ghyatzo
Copy link
Contributor Author

ghyatzo commented Nov 16, 2024

how would this compare to a more naive implementation like

nth(itr, n) = first(Iterators.Rest(itr, n))

?

Rest requires knowing the state used by the iterator, which is often considered an implementation detail and hard to pick automatically (unless i am missing something!)
If the state was known first(Rest(itr, n)) would probably be the fastest, since you alwasy do at most one iteration.
but knowing the correct n-1 state means that you most likely calculate n state directly.
In that case then a specialization would be even better!

@mcabbott
Copy link
Contributor

mcabbott commented Nov 17, 2024

Seems like a lot of code.

I reproduced the above benchmark here:
https://gist.github.com/mcabbott/fe2e0821e9bfe5cc7643bb15adf445d0
I get 75.625 μs for first(Iterators.drop(itr, n-1)) vs 1.558 μs for the PR. However, this is entirely driven by one case, Cycle{Vector{Int64}}. Some other cases are faster, some slower (String). Maybe they ought to be discussed in follow-up PRs?

No strong position on whether this needs a name or not, but perhaps this first PR can focus on that, and let the implementation be just:

nth(itr, n::Integer) = first(Iterators.drop(itr, n-1))
nth(itr::AbstractArray, n::Integer) = itr[begin-1+n]

@ghyatzo
Copy link
Contributor Author

ghyatzo commented Nov 17, 2024

A lot of the code is for optimizing out of bound checking. If we go with davidantoff suggestion of letting nth just error on oob n most of the actual code can be scrapped while retaining the speed.

@jakobnissen
Copy link
Member

I disagree with throwing an error. In cases where you don't know if an nth element exists, that forces a try-catch which is both slow and brittle. I would imagine that most ordered iterators with a known length support indexing, so this would probably mostly be used precisely when the length is unknown.

@davidanthoff
Copy link
Contributor

I think another consideration here is consistency: the other functions we have that take an individual element from an iterator are first and last, and both throw an error if you ask them for something that doesn't exist (i.e. when you call them on an empty source). In my mind, first, nth and last should have the same kind of design, so that also speaks in favor of throwing an error.

I agree with @jakobnissen that in some situations being able to handle this without an exception would be nice, but on the flip side, I can also see scenarios where an error seems much better, in particular in interactive sessions where I might be playing around with some data and this function could be very useful. And especially in an interactive scenario it would be super inconvenient if Some was used...

Maybe the best design would be to allow for both scenarios. Say something like

nth(itr, n, nothrow=false)

So the default would be that an exception is thrown if the nth element doesn't exist, but when nothrow=true then nothing is returned if the element doesn't exist, and on success things are wrapped in Some.

@ghyatzo
Copy link
Contributor Author

ghyatzo commented Nov 18, 2024

We could also opt for relying on the IteratorEltype trait to check if an iterator can contain nothing elements.
With ::HasEltype() we can dispatch over eltype(itr) <: Union{Nothing, T} where T.
In that case nth can return Some(nothing) while otherwise just return nothing since that would be unambiguous for those collections that do not have nothing in them. Of course wrapping with Some would be the default in case we have a ::EltypeUnknown Iterator.

Some is already expected in workflows with Union{Nothing, T} so that wouldn't introduce any extra complexity.

Although I see the similarity with first and last I'd be more akin to accomunate nth(itr, n) to the their siblings first(itr, n) and last(itr, n) and, I would argue, that first is a bit of an outlier in throwing on an empty collection, since for example,
according to documentation (and code) last never really throws an error:

"Return the end point of an AbstractRange even if it is empty."

the error in last([]) is from getindex that receives a 0 from

lastindex(a::AbstractArray) = (@inline; last(eachindex(IndexLinear(), a))) # equals to last(OneTo(0))

Similarly, both first(itr, n) and last(itr, n) rely on the take(itr, n) iterator which simply returns nothing when finishing the elements of the underlying iterator.

From this my idea that in principle iterators are non throwing by default, any throwing should be done one level higher and not at the iterator level itself (like how getindex and last interact). Iteration is a "low level" interface and I believe it should give the user the choice on how to handle "end states" of the iteration.

@davidanthoff
Copy link
Contributor

We could also opt for relying on the IteratorEltype trait

I have to admit, I think that is the option I like least of all of the proposed options so far :) It would make it very tricky to write generic code that uses the nth function, essentially now I would have to check the trait every time I call nth on something to be able to correctly interpret the return value from nth.

I'd be more akin to accomunate nth(itr, n) to the their siblings first(itr, n) and last(itr, n)

To me nth(itr, n) is conceptually way closer to first(itr) and last(itr) because all of these produce single values, rather than streams of values. Whenever a function produces a stream of values there is a simple, natural way to return no value: namely an empty stream. But that is exactly not possible for functions that are supposed to return just one value.

From this my idea that in principle iterators are non throwing by default

Agreed, but the whole difference between first(itr) and first(itr, n) is the one does not produce an iterator, while the other one does.

I still think that my proposal with an argument like nothrow would be the cleanest solution here :), are there things that you think are problematic about it?

@mcabbott
Copy link
Contributor

mcabbott commented Nov 18, 2024

Is there any precedent for a nothrow keyword?

We could also follow get(A, key, default), as you suggested earlier. It seems [edit: in this case!] a little confusing that this goes by enumeration not indexing, so maybe it shouldn't be called Iterators.get, but is there a good suggestive name? getnth(iter, number, default) or getcount or something? Somehow using nth(iter, n, default) seems a bit at odds with first, last, at least to me.

@ghyatzo
Copy link
Contributor Author

ghyatzo commented Nov 19, 2024

It would make it very tricky to write generic code that uses the nth function, essentially now I would have to check the trait every time I call nth on something to be able to correctly interpret the return value from nth.

I think it's already hard to write generic code that covers both generic collections and Union{Nothing, Some{T}} collections.
Most likely you'd need a specialized function to handle the Somes anyway.
So my reasoning was that since handling of collections of Union{Nothing, Some{T}} already requires a special context that uses Somes, nth in that context would just behave like you'd expect.
But everywhere else, is always the same.

Whenever a function produces a stream of values there is a simple, natural way to return no value: namely an empty stream. But that is exactly not possible for functions that are supposed to return just one value.

Agreed, but the whole difference between first(itr) and first(itr, n) is the one does not produce an iterator, while the other one does.

first(itr, n) and last(itr, n) are not iterators but just wrappers (like nth), they collect an iterator to produce a single value. The only difference is that this value is a collection of elements instead of just an element.

I still think that my proposal with an argument like nothrow would be the cleanest solution here :), are there things that you think are problematic about it?

Not really, I don't have particularly hard opinions about it. In the original issue I had proposed something similar with nth(itr, n, skip_checks=false). What I am most concerned about is forcing the use of Some.

It seems a little confusing that this goes by enumeration not indexing, so maybe it shouldn't be called Iterators.get, but is there a good suggestive name? getnth(iter, number, default) or getcount or something? Somehow using nth(iter, n, default) seems a bit at odds with first, last, at least to me.

My proposal for nth was an attempt to stay in the "cardinal numbering" semantic sphere like first or last.
Another inspiration was the iter.nth(n) from Rust, and in that case the n was actually used as an index so nth element == iter.nth(n-1) which i found very confusing but I digress.
I can get behind the default argument though!

@jakobnissen
Copy link
Member

Having thought about it, I do have some sympathy for the argument of @davidanthoff that it should behave like first and last.

I do see myself wanting to use it in code like:

fourth_field = @something nth(eachsplit(line, '\t'), 4) throw(FormatError(
    lazy"Line $lineno does not contain four tab-separated fields fields"
))

Which would now instead be

fourth_field = first(@something iterate(drop(eachsplit(line, '\t'), 3)) throw(FormatError(
    lazy"Line $lineno does not contain four tab-separated fields fields"
)))

That's certainly doable (especially since, for iterators of unknown length, most of the clever tricks that nth can do doesn't apply, so drop + iterate will do the same thing), but it's slightly less nice. But the consistency with first and last does perhaps weigh more heavy.

@jariji
Copy link
Contributor

jariji commented Nov 21, 2024

What is the semantic difference between this function and getindex or get?

@adienes
Copy link
Member

adienes commented Nov 21, 2024

nth and trynth seems a bit more julian than a nothrow kwarg

matching tryparse and trylock,

@davidanthoff
Copy link
Contributor

nth and trynth seems a bit more julian than a nothrow kwarg

Yes, agreed! Having two distinct functions probably also helps with type stability.

Another naming scheme I thought about is nth_or_nothing, or something along those lines, Rust has that kind of pattern a fair bit. But I think @adienes idea is actually better as there seems to be more precedence in Julia for that.

@aplavin
Copy link
Contributor

aplavin commented Nov 21, 2024

Julia has a bunch of patterns for handling this already, so one has some freedom to choose "consistent with what?" :)
For example, first and last error when there's no element. While findfirst returns nothing, exhibiting the same confusion between "not found" and "found nothing".

@jariji
Copy link
Contributor

jariji commented Nov 21, 2024

I see 4 ways of handling errors in Julia:

  • Union{T,Nothing}: These functions are convenient because they don't require unwrapping T, but imho more trouble than they're worth because in generic library programming you can't safely assume T and Nothing are disjoint, which means libraries have to document complex APIs or risk data corruption.

  • throw(IndexError): This is convenient because it doesn't require any unwrapping and is unambiguous. It's not noexcept, obviously, but that's often fine in many use cases.

  • Option{Ok{T},Err{E}}: With Moshi.jl, Julia has the foundations for proper option types; we just need a package with Option{Ok{T},Err{E}} and some Rust-style functions like and_then, or_else implemented on it. This style requires a bit more code but produces reliable and efficient systems because all branches can be covered with compile-time exhaustive pattern matching.

  • Union{Some{T},Nothing} is just a lightweight Option with less-informative error values.

Personally I'd be happy with Base having both throwing functions and Option-returning functions. I'd prefer to minimize the amount of Union{T,Nothing} functions spreading through the ecosystem because they complicate correct generic programming.

@mcabbott
Copy link
Contributor

mcabbott commented Nov 21, 2024

The 5th option is Union{T,S} where you supply S -- like get. That, 2 (error, like getindex) and 4 (Some/Nothing) are the competitors.

What is the semantic difference between this function and getindex or get?

It takes the iteration count, not the index. (Same on Vector, different on Dict, or OffsetArray.)

While findfirst returns nothing, exhibiting the same confusion between "not found" and "found nothing".

That's one's not as bad, as it's either an index or nothing. (You could make Dict(nothing=>2, missing=>3), but you could also name your team "who", "what" and "I don't know"...)

@jariji
Copy link
Contributor

jariji commented Nov 21, 2024

It takes the iteration count, not the index

I see, thanks.

That's one's not as bad, as it's either an index or nothing. (You could make Dict(nothing=>2, missing=>3), but you could also name your team "who", "what" and "I don't know"...)

The problem with "just assume users do what you expect" is that (1) nobody ever documents what they expect and (2) even documented it increases the complexity of usage. No library function using findfirst ever documents that passing a nothing key is undefined behavior. And if they did, the caller would have to increase the complexity of their code to work around the failure in the special case of nothing keys, instead of just using a simple procedure that always works. Imho this kind of API will increase the overall level of complexity and opportunity for mistakes.

The 5th option is Union{T,S} where you supply S -- like get. That, 2 (error, like getindex) and 4 (Some/Nothing) are the competitors.

get(c,k,d) is a good point. That's a good option where appropriate, though the 2-arg function still needs to do something, unless it's just not provided?

@stevengj
Copy link
Member

Might be nice to accept an AbstractVector{<:Integer}, similar to getindex, so that you could do nth(itr, 5:7) or nth(itr, [3, 17, 245]).

@stevengj
Copy link
Member

But I agree with nth(itr, n) throwing an error for out-of-bounds, and nth(itr, n, default) returning default if n is out-of-bounds. Similar to get.

@jariji
Copy link
Contributor

jariji commented Nov 27, 2024

@stevengj

Might be nice to accept an AbstractVector{<:Integer}, similar to getindex, so that you could do nth(itr, 5:7) or nth(itr, [3, 17, 245]).

Nonscalar indexing getindex(xs, ixs) diverges from modern julia practice of using explicit rather than implicit broadcasting when mapping a function over a collection. Discussed at length in #30845. Personally I'd rather this practice not be extended into new functions.

@mcabbott
Copy link
Contributor

I presume the point of nth(iter, [3, 17, 245]) vs. broadcasting nth.(Ref(iter), [3, 17, 245]) is that it would run the iterator only once.

How well nth fits with the existence of iterators which don't like to be run twice, or are expensive to run through, that seems a bit awkward.

@LilithHafner LilithHafner added triage This should be discussed on a triage call and removed triage This should be discussed on a triage call labels Dec 5, 2024
@ghyatzo
Copy link
Contributor Author

ghyatzo commented May 4, 2025

Bump.

@adienes
Copy link
Member

adienes commented May 6, 2025

the only non-explicitly-triage-approved part I can see is the inclusion of the curried 1-arg method, nth(n) = Fix2(nth, n)

personally, I don't have a particularly strong opinion in magnitude but I do agree in direction with @stevengj that it's a bit inconsistent given that there are basically no other 1-arg methods like this in Iterators (should we then add filter(pred), take(n), drop(n), etc.) but there is also some support upthread for adding it anyway

does it need another triage discussion? probably not --- maybe just a show of 👍👎 somewhere suffices?

end

# finite cycle: in reality a Flatten{Take{Repeated{O}}} iterator
function nth(itr::Flatten{Take{Repeated{O}}}, n::Integer) where {O}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a clear need for all these specialisations? They seem quite exotic... do they show up? Are they really almost always faster? Are they in fact all well-tested?

When I looked at the benchmarks earlier, the total time was driven by one case, so didn't answer these questions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Flatten{Take{Repeated{O}} is the actual type of a finite cycle, as in Iterators.cycle(itr, n).
In your benchmark the total time was driven by the infinite cycle, the second biggest contribution was the finite cycle by a 36x factor (72ns vs 2ns) and allocating vs non allocating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more concrete, can you please post benchmarks for each of the methods in the present PR, comparing to the most basic implementation? Ideally at varying lengths etc. to see if small/large is different.

Flatten{Take{Repeated{O}} is the actual type of a finite cycle, as in Iterators.cycle(itr, n).

That is also what the comment should say, not repeat the signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you refer to the Count Repeated and Take{<:Repeated} specializations, at the time I had some small testing and the performance was always matching the naive or better.
I've rerun some tests and now only the Take{<:Repeated} specialization has somewhat of a slightly better performance. But yes, probably also that doesn't warrant a specialization anymore.

julia> @benchmark nth($repeateditr, 9235)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min  max):  3.042 ns  8.750 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.167 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.172 ns ± 0.097 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                    ▅         █         ▅        ▃          ▁
  ▃▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▇ █
  3.04 ns     Histogram: log(frequency) by time     3.29 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark _nth($repeateditr, 9235)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min  max):  3.333 ns  14.166 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.458 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.468 ns ±  0.331 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▂     █     █     ▃    ▂▃    ▂     ▁     ▁            ▂
  ▃▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁██▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▇▁▁▁▁▇ █
  3.33 ns      Histogram: log(frequency) by time     3.75 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Copy link
Contributor

@mcabbott mcabbott May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refer to every single specialisation. Each one needs to be individually justified. Again, on long cases, and short cases, and weird types (cycle of a string? cycle of a tuple?). And ideally in some file others can easily try out on other machines.

Those that survive also need to be individually tested. Again, with weird cases (cycle of an empty vector?) poking for weak points. if IteratorSize(I) isa Union{HasShape, HasLength} means you must invent cases taking both paths. Testing only-just out-of-bounds in both directions. And written in a way that makes it obvious to the reader of the test which specialisation is being tested, and where the true values are coming from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the specialization errors behave like the simple case? as in: throwing an ArgumentError("Drop length must be non-negative") on negative indices and ArgumentError("collection must be non-empty") when "out of bounds". It is a bit artificial since the non-empty error stems from the drop iterator returning an empty collection if called out out bounds. We're basically simulating what the error would be if the first(drop()) implementation was used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this looks great, will read slowly.

Haven't thought hard re errors, it does seem friendly to have something uniform & deliberate. But how much complexity it's worth I don't know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cycles of strings might need an extra specialization since the issue is length(::String), it would probably require doing some heuristic using ncodeunits(::String) to decide when starting to use the specialization.
But at least, we know already that a String HasLength so it's basically a "complexity swap"...
I am not sure if it is worth it given the relative niche usage of these iterators.
But... for longer strings the difference is pretty big (ms vs ns)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the performance could be matched by specializing first and/or drop instead of nth on these types from Iterators. What do you think?

Copy link
Contributor Author

@ghyatzo ghyatzo May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cycle index recalculations could indeed be specialized for drop directly. That's a good point.
Although cycles of strings would still need some extra specialization.

Upon further reflection, I don't think it can work. drop must return an iterator that generates all but the first n elements. So although a finite cycle iterates through the same shorter iterable, drop must still produce all elements after those we skip. The shortcircuit we do only works because we don't care about the rest of the drop iterator, only it's first element.

as an example, if we have it = cycle([1,2,3], 3) that is equivalent to [1,2,3,1,2,3,1,2,3] and although it[2] == it[5] we can't say the same about Drop(it, 1) and Drop(it, 4)

@ghyatzo
Copy link
Contributor Author

ghyatzo commented May 6, 2025

RFC: should this PR introduce the single argument version of nth(n) = Fix2(nth, n) allowing for usage like map(nth(n), myiterable)?
Some pro and cons are discussed further up the thread.
I'd say that in case of parity 👎 should win as the more flexible alternative.

👍 or 👎 this comment

@PatrickHaecker
Copy link
Contributor

PatrickHaecker commented May 9, 2025

should this PR introduce the single argument version of nth(n) = Fix2(nth, n)

I propose a compromise where only

nth(n::Integer) = Fix2(nth, n)

is defined (mind the type annotation). This would allow for most (all?) of the use cases of getting different elements of the same iterables while still leaving room for getting the nths element for most (all?) different iterables (at least for the iterables which are not a subtype of Integer).

@ghyatzo
Copy link
Contributor Author

ghyatzo commented May 9, 2025

I am not sure I understand what you are proposing. The single argument version is already implemented in this PR and it accept only integers.

at least for the iterables which are not a subtype of Integer

The single argument version would be useful to define a function that returns the nth element of whatever iterable you give it. So the n stands for the index to fix, not the iterable

But maybe I missed your point completely.

@PatrickHaecker
Copy link
Contributor

PatrickHaecker commented May 10, 2025

The single argument version is already implemented in this PR and it accept only integers.

Thanks, I hadn't seen this.

However, the point still stands: The question should not be whether to define nth(n) = Fix2(nth, n) in Base (as indicated here and here), but whether to define nth(n::Integer) = Fix2(nth, n).

There seem to be two useful definitions for a single argument nth method:

  1. nth(n::Integer) = Fix2(nth, n)
  2. nth(itr) = Fix1(nth, itr)

The problem of the second version is that we do not have an Iterable and so we can only define it (implicitly) with type constraint Any, i.e. no constraint at all.
As long as we define the first version with the type constraint Integer (and not with Any as indicated above, we can later define the second version whenever we want to as we do not block it, as it's still free for each itr not being an Integer.

The second version might be useful in cases where the iteration count is the data in your data flow and the iterator is the thing which this data operates on, like in this contrived example

function abs_sum_some_selected(itr, select::Function)
    select() |> filter(>(2)) .|> nth(itr) .|> abs |> sum
end
julia> abs_sum_some_selected(-10:-1:-15, ()->(2, 3, 4))
25

I am not proposing to add the second version to Base with this PR. I am proposing to add the first version constrained to Integer (i.e. supporting your PR) so that we do not block the possibility to add the second version later (or the user adding it locally if they accept the type piracy).

@ghyatzo
Copy link
Contributor Author

ghyatzo commented May 10, 2025

There was never the intention of adding the second version you mention. There is no ambiguity in nth(n)=Fix2(nth,n). The second argument can only be an integer in the context of this PR.

@PatrickHaecker
Copy link
Contributor

Although I seem to have difficulties to write down my point in an understandable way and the method discussed here still does not match the implemented method, I think the important point is that we agree about what is currently proposed in this PR as an implementation.
So please continue with this PR and implement

nth(n::Integer) = Fix2(nth, n)

and consider this part of the discussion as resolved.
Thanks for the PR. I appreciate this work and look forward to have an nth function included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests iteration Involves iteration or the iteration protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.