-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Subclassed DataFrame doesn't persist _metadata properties across binary operations #34177
Comments
DataFrame binary ops (like pandas/pandas/tests/generic/test_finalize.py Lines 127 to 135 in 085af07
I don't think there would be any objection to calling finalize here. The primary API question is what to do with metadata / attrs when @clausmith are you interested in working on this? |
xref #28283 for the general issue. This can be specific to binops. |
@TomAugspurger ah that's what I thought. I saw #28283 and figured this might be related. Unfortunately I'm a little out of my depth here (I'm a PM more so than an engineer 😬). |
Hi @TomAugspurger, I'm working on a side project with Pandas and need this problem to be fixed. I would love to help you. Can you please explain me how I should proceed to first understand the issue and then fix it? Thank you. |
Great, thanks.
I would start with the `Series.op(Series)` case, and then work up to Series
& Frame and Frame & Frame.
Series._binop is probably the place this should go. We'll need a call to
`NDFrame.__finalize__` somewhere in
https://github.com/pandas-dev/pandas/blob/42a5c1c1aac401735a9e06e21fece93f58a4b4ec/pandas/core/series.py#L2593-L2627.
Perhaps it could be done as part of `_construct_result`, not sure.
As I mentioned earlier, it's not 100% clear how we'll propagate metadata /
`.attrs` when the values differ. I think just ignore this for now. We can
define and document the behavior we want once we figure that out.
…On Fri, May 22, 2020 at 2:16 AM Jonathan Besomi ***@***.***> wrote:
Hi @TomAugspurger <https://github.com/TomAugspurger>, I'm working on a
side project with Pandas and need this problem to be fixed. I would love to
help you. Can you please explain me how I should proceed to first
understand the issue and then fix it? Thank you.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#34177 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIUNP6AMSIE57VZGHP3RSYREFANCNFSM4NA5GKRA>
.
|
Great, thank you. Will start from Series._binop then. "As I mentioned earlier, it's not 100% clear how we'll propagate metadata / A related yet-different problem is to assign to a new or existing column of a pandas Dataframe a pandas Series containing metadata. Also in this case the metadata information is lost. Do you think it will be possible to solve this problem too or there might be some other reasons why we should keep things as they are? |
In general I'd recommend moving away from By differeing attrs I mean different keys, or perhaps the same keys but different values. I think eventually we'll want a system to decide how to propagate the attrs in that case, but I'm not sure yet what that should look like. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
When subclassing a DataFrame, fields added to the
_metadata
property are only persisted across some operations (such as slicing) and not others (such as any arithmetic operation).I would expect any properties defined on the subclass to persist whenever the result of an operation is an instance of the subclass.
The following is the example taken from the "Extending Pandas" docs: https://pandas.pydata.org/pandas-docs/stable/development/extending.html
With the above setup, here's how to reproduce the problem:
Problem description
The current behavior means that you can almost never rely on custom properties to persist on a subclassed DataFrame. This substantially reduces the utility of these custom properties.
Expected Output
I would expect the
added_property
property in the example above to persist after performing the arithmetic operation on the DataFrame. Especially because the result of(df * 2)
is still an instance ofSubclassedDataFrame2
.Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: