-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Right now, there is no way to represent a non-nullable, "degenerate" FixedSizeListArray
with arbitrary length. By degenerate, I mean that the size
of each list scalar is equal to 0, meaning the FixedSizeList
does not actually store any data.
The try_new
constructor will use the length of the null buffer to determine the length of the array in the degenerate case, but if the null buffer is None
(indicating a non-nullable array), then the code incorrectly defaults to 0.
arrow-rs/arrow-array/src/array/fixed_size_list_array.rs
Lines 155 to 156 in 751b082
let len = match s { | |
0 => nulls.as_ref().map(|x| x.len()).unwrap_or_default(), |
Possible Solutions
You cannot correctly handle this case by relying on the length of the null buffer, so I believe there are only 2 sane solutions. The first is to fix the try_new
constructor and require it to take in a len: usize
.
Since a breaking change like this might not be acceptable, then a try_new_with_length
constructor can be made that does take in a len
parameter, and the existing try_new
can call into try_new_with_length
while documenting that if users want this specific degenerate and non-nullable case, they must use try_new_with_length
.
Edit: I've made a first step in #8624
Edit: I also just realized that if a caller passes in a values
array that is not an exact multiple of size
to try_new
, we immediately have an incorrect FixedSizeList
. So if I pass in an array of length 6, and I specify an list size of 4 and a single null in my null buffer, that creates an invalid(?) array (though technically you can never access those last two values?)