Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Add Cumsum and duplicated bindings to datasets in R #44673

Open
larry77 opened this issue Nov 7, 2024 · 3 comments
Open

[R] Add Cumsum and duplicated bindings to datasets in R #44673

larry77 opened this issue Nov 7, 2024 · 3 comments

Comments

@larry77
Copy link

larry77 commented Nov 7, 2024

Describe the enhancement requested

I often use the open_dataset() interface in R to avoid swallowing everything into RAM, but I see that the cumsum (#35180) and duplicated functions are not available in this case. Any hope to see them implemented in the future or any workaround (I cannot call collect() and then apply them due to ram issues).
Thanks!

Component(s)

R

@amoeba amoeba changed the title Add Cumsum and duplicated bindings to datasets in R [R] Add Cumsum and duplicated bindings to datasets in R Nov 7, 2024
@thisisnic
Copy link
Member

@jonkeane I was trying to take a look at this but wasn't sure if it'd be possible to implement on Datasets (see comments on #35180) What do you think?

@jonkeane
Copy link
Member

It might be possible, I haven't looked deeply. But one thing that would make it less by-default useful is that acero isn't ordered by default. So if you wanted to calculate this in a manner that was the same run to run, you would need to sort first and operate on that. Which itself will require at least the sort keys to be pulled into memory.

@larry77
Copy link
Author

larry77 commented Nov 12, 2024

I see. I push for this because I really need it and I also think it is of general interest (sorting and summing cumulatively are the bread and butter of many analyses).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants