You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When you use Concatenate in combination with preprocessors that create vector feature columns(such as OneHotEncoder or MultiHotEncoder), the output of Concatenator is not flattened (this is arguably correct behavior, it's not really documented). However, the goal of Concatenator is typically to provide tensor inputs to models, which in many cases is expected to be flat tensors of floats.
Based on offline discussions in the Ray slack, I'd like to propose supporting a flatten flag for the Concatenator that optionally will flatten any vector columns in-place within the output vector. I will follow up soon with an implementation and tests in a PR.
Use case
When using encoder preprocessors that output a vector column, we want to flatten the columns in the final concatenate step for input to the model.
The text was updated successfully, but these errors were encountered:
richardliaw
changed the title
[Ray Data: Preprocessors] Support flattening vector features in concatenator
[data/proprocessors] Support flattening vector features in concatenator
Mar 28, 2025
The Concatenator is now erroring out when used with OneHotEncoder encoded column:
00:25:04.629 concatenated = df[self.columns].to_numpy(dtype=self.dtype)
00:25:04.629 File "/home/ray/.venv/lib/python3.10/site-packages/pandas/core/frame.py", line 1993, in to_numpy
00:25:04.629 result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)
00:25:04.629 File "/home/ray/.venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1694, in as_array
00:25:04.629 arr = self._interleave(dtype=dtype, na_value=na_value)
00:25:04.629 File "/home/ray/.venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1753, in _interleave
00:25:04.629 result[rl.indexer] = arr
00:25:04.629 ValueError: setting an array element with a sequence.
Thanks for the flag, updated this to a bug cc: @richardliaw
jcotant1
added
bug
Something that is supposed to be working; but isn't
and removed
enhancement
Request for new feature and/or capability
labels
Mar 31, 2025
Description
When you use Concatenate in combination with preprocessors that create vector feature columns(such as
OneHotEncoder
orMultiHotEncoder
), the output ofConcatenator
is not flattened (this is arguably correct behavior, it's not really documented). However, the goal ofConcatenator
is typically to provide tensor inputs to models, which in many cases is expected to be flat tensors of floats.Based on offline discussions in the Ray slack, I'd like to propose supporting a
flatten
flag for theConcatenator
that optionally will flatten any vector columns in-place within the output vector. I will follow up soon with an implementation and tests in a PR.Use case
When using encoder preprocessors that output a vector column, we want to flatten the columns in the final concatenate step for input to the model.
The text was updated successfully, but these errors were encountered: