-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] pyarrow.compute.skew(skip_nulls=True)
still counts NULL as an observation?
#45733
Comments
No, it's computing the biased skew. Would unbiased be more/less useful? Or should we just add an option to make both variants available?
No, it's always the same value regardless of the number of nulls: >>> pc.skew([1.0, 2.0, 3.0, 40.0], skip_nulls=True)
<pyarrow.DoubleScalar: 1.14831951332278>
>>> pc.skew([1.0, 2.0, 3.0, 40.0, None], skip_nulls=True)
<pyarrow.DoubleScalar: 1.14831951332278>
>>> pc.skew([1.0, 2.0, 3.0, 40.0, None, None], skip_nulls=True)
<pyarrow.DoubleScalar: 1.14831951332278> |
cc @icexelloss |
This would be a nice option and aligns with what scipy does https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html. It also would allow us over in pandas to use an unbiased |
Would you be willing to try and contribute it? This should be relatively easy if you know a bit of C++ and have already touched the Arrow C++ codebase. |
I can try attempting it at a later date, but I cannot guarantee I can get to it before Arrow 20 is released. |
I think supporting both bias and unbias estimator makes sense. I am not 100% which one should be default but I am leaning towards consistency with pandas behavior. |
Describe the bug, including details regarding any error messages, version, and platform.
If
skew
is always calculating the unbiased skew, with pyarrow's value being lower than pandas's value it appears pyarrow might be countingNone
as an observation while pandas is not considering it's missing value as an observation.cc @pitrou xref #45677
Component(s)
Python
The text was updated successfully, but these errors were encountered: