Open
Description
Describe the enhancement requested
In pandas / Polars, I can do:
dict(df.group_by(['a', 'b', 'c']).__iter__())
There doesn't seem to be a built-in way to do this in PyArrow, hence I'm opening this as a feature request
Concretely, if I have
import pyarrow as pa
tbl = pa.table({'a': [1,1,3], 'b': [4, 4, 4], 'c': [1, 3, 2]})
then I'd like a way to end up with
{(3,
4): pyarrow.Table
a: int64
b: int64
c: int64
----
a: [[3]]
b: [[4]]
c: [[2]],
(1,
4): pyarrow.Table
a: int64
b: int64
c: int64
----
a: [[1,1]]
b: [[4,4]]
c: [[1,3]]}
For context, this would be for use in Narwhals, where we have tried to come up with a workaround, but it does exhibit a noticeable slow-down as the number of grouping keys grows - 3 keys is enough for it to be slower than pandas
Component(s)
Python