Skip to content

sem does not accept iterables #342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
piever opened this issue Jan 19, 2018 · 5 comments
Closed

sem does not accept iterables #342

piever opened this issue Jan 19, 2018 · 5 comments

Comments

@piever
Copy link
Contributor

piever commented Jan 19, 2018

The typing of sem may be too strict:

julia> itr = (i for i in 1:3)
Base.Generator{UnitRange{Int64},##3#4}(#3, 1:3)

julia> mean(itr)
2.0

julia> var(itr)
1.0

julia> std(itr)
1.0

julia> sem(itr)
ERROR: MethodError: no method matching sem(::Base.Generator{UnitRange{Int64},##3#4})
Closest candidates are:
  sem(::AbstractArray{#s26,N} where N where #s26<:Real) at /home/pietro/.julia/v0.6/StatsBase/src/scalarstats.jl:237

Could we simply relax the definition here and allow it to accept anything?

@nalimilan
Copy link
Member

I think this problem is quite general, we often require arrays of Real when we actually only need iterables. It would make sense to go over functions to check which one could use a broader signature. This is particularly important for Array{Union{<:Real, Missing}} and for passing skipmissing(x).

@piever
Copy link
Contributor Author

piever commented Jan 19, 2018

I hadn't thought about it, but after #280 things don't work even with DataArrays:

julia> sem(@data [1,2,3])
ERROR: MethodError: no method matching sem(::DataArrays.DataArray{Int64,1})
Closest candidates are:
  sem(::AbstractArray{#s26,N} where N where #s26<:Real) at /home/pietro/.julia/v0.6/StatsBase/src/scalarstats.jl:237

EDIT: I had added an extra comment with a proposal but I've just realized it was a bad idea. I'll see if I can come up with a list of functions whose signature is too strict in the next few days.

@piever
Copy link
Contributor Author

piever commented Jan 24, 2018

I've tried going through scalarstats.jl to see how many functions could widen their signature (to work with skipmissing) but it's actually a bit tricky. There are at least 3 cases:

  1. It would simply work by widening the signature (sem, variation)
  2. It would need to be rewritten using the iteration protocol rather than length (skipmissing gives iterators that don't have length) and getindex. It is the case of geomean, harmmean and friends. Without dispatching on AbstractArray{<:Real} we can't control whether the user gives an array of say complex numbers - meaning it's be bad to initialize the summation with a Float64): that case should probably be addressed by sth like this where perhaps the empty case should return missing rather than error.
  3. it uses the type of the AbstractArray{T} where {T<:Real} signature (i.e. mad): unsure what to suggest: maybe a simple fallback mad(v) = mad(collect(v)) being careful that there are no cases where it keeps calling itself recursively

Maybe the easiest is to start by widening the signatures for case 1?

@nalimilan
Copy link
Member

Thanks for checking. Indeed we can already widen the signatures where possible, it won't hurt.

Then for 2) it looks like it wouldn't be too hard to stop relying on length and getindex. As you suggest, the result type can be chosen as for mean, i.e. first_value + zero(first_value). For empty collections, better raise an error just like mean for consistency (though it would also be possible to use zero(eltype(x)) for HasEltype iterators, but this should be changed in Base too if we do that).

For 3), calling collect looks OK given that we need to allocate a copy for the computation anyway.

(BTW, I've realized the various mean functions should promote the element type before computing the sum, to avoid overflows with small types. That's also the case in Base, see JuliaLang/julia#25739.)

@oxinabox
Copy link
Contributor

overlap with #449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants