You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I often use the open_dataset() interface in R to avoid swallowing everything into RAM, but I see that the cumsum (#35180) and duplicated functions are not available in this case. Any hope to see them implemented in the future or any workaround (I cannot call collect() and then apply them due to ram issues).
Thanks!
Component(s)
R
The text was updated successfully, but these errors were encountered:
It might be possible, I haven't looked deeply. But one thing that would make it less by-default useful is that acero isn't ordered by default. So if you wanted to calculate this in a manner that was the same run to run, you would need to sort first and operate on that. Which itself will require at least the sort keys to be pulled into memory.
I see. I push for this because I really need it and I also think it is of general interest (sorting and summing cumulatively are the bread and butter of many analyses).
Describe the enhancement requested
I often use the open_dataset() interface in R to avoid swallowing everything into RAM, but I see that the cumsum (#35180) and duplicated functions are not available in this case. Any hope to see them implemented in the future or any workaround (I cannot call collect() and then apply them due to ram issues).
Thanks!
Component(s)
R
The text was updated successfully, but these errors were encountered: