Skip to content

[ENH] Method for adding functionality to GroupBy #587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zbarry opened this issue Oct 12, 2019 · 3 comments · May be fixed by #1462
Open

[ENH] Method for adding functionality to GroupBy #587

zbarry opened this issue Oct 12, 2019 · 3 comments · May be fixed by #1462
Labels
available for hacking This issue has not been claimed by any individual. enhancement New feature or request good advanced issue Issues that would require Python trickery to get an elegant implementation good intermediate issue Issues that are good for seasoned programmers to make a contribution

Comments

@zbarry
Copy link
Collaborator

zbarry commented Oct 12, 2019

It would be nice to be able to add functionality to the Pandas GroupBy objects: GroupBy, DataFrameGroupBy, SeriesGroupBy. There's no convenient accessor interface to do this, but maybe there's a way to reliably monkeypatch them. This would allow us to create nifty aggregation / apply functions and avoid the .groupby(...).apply() route for tasks we may encounter routinely. It could also potentially open up opportunities to speed up such operations... .groupby().apply() can often be slow for large numbers of groups.

@zbarry zbarry added available for hacking This issue has not been claimed by any individual. enhancement New feature or request good advanced issue Issues that would require Python trickery to get an elegant implementation good intermediate issue Issues that are good for seasoned programmers to make a contribution labels Oct 12, 2019
@zbarry
Copy link
Collaborator Author

zbarry commented Oct 18, 2019

@Zsailer - what do you think about such a capability in PF?

@samukweku
Copy link
Collaborator

@zbarry @ericmjl @pyjanitor-devs/core-devs how can we make this possible? is this even possible?

@samukweku
Copy link
Collaborator

samukweku commented Nov 29, 2022

one way about this is with a summarise function, that has a by parameter, and within that function we can do all the magic within it. inspired by the update to the summarise feature coming in dplyr 1.1, and rdatatable and pydatatable use of by.

crude API example

df.summarise(col_name = func or arg name, by = func or kwargs)

We can even make it such that you can filter within a groupby effectively (maybe?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
available for hacking This issue has not been claimed by any individual. enhancement New feature or request good advanced issue Issues that would require Python trickery to get an elegant implementation good intermediate issue Issues that are good for seasoned programmers to make a contribution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants