Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/type preservation empty dataframes #301

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions lib/red_amber/data_frame_variable_operation.rb
Original file line number Diff line number Diff line change
Expand Up @@ -675,9 +675,18 @@ def update_fields_and_arrays(updater)
raise DataFrameArgumentError, "Data size mismatch (#{data.size} != #{size})"
end

a = Arrow::Array.new(data.is_a?(Vector) ? data.to_a : data)
fields[i] = Arrow::Field.new(key, a.value_data_type)
arrays[i] = Arrow::ChunkedArray.new([a])
if data.respond_to?(:to_arrow_chunked_array)
chunked_array = data.to_arrow_chunked_array
else
if data.respond_to?(:to_arrow_array)
a = data.to_arrow_array
else
a = Arrow::Array.new(data)
end
chunked_array = Arrow::ChunkedArray.new([a])
end
fields[i] = Arrow::Field.new(key, chunked_array.value_data_type)
arrays[i] = chunked_array
end
[fields, arrays]
end
Expand Down
16 changes: 16 additions & 0 deletions lib/red_amber/vector.rb
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,22 @@ def to_ary
alias_method :values, :to_ary
alias_method :entries, :to_ary

# Convert to an Arrow::Array.
#
# @return [Arrow::Array]
# Apache Arrow array representation.
def to_arrow_array
@data.to_arrow_array
end

# Convert to an Arrow::ChunkedArray.
#
# @return [Arrow::ChunkedArray]
# Apache Arrow chunked array representation.
def to_arrow_chunked_array
@data.to_arrow_chunked_array
end

# Indeces from 0 to size-1 by Array.
#
# @return [Array]
Expand Down
15 changes: 15 additions & 0 deletions test/test_data_frame_variable_operation.rb
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,21 @@ class DataFrameVariableOperationTest < Test::Unit::TestCase
assert_equal str2, @df2.assign { assigner2.to_a }.tdr_str
end

sub_test_case 'Dataframe with zero n_records' do
test 'assign by block' do
str = <<~STR
RedAmber::DataFrame : 0 x 4 Vectors
Vectors : 2 numeric, 1 string, 1 boolean
# key type level data_preview
0 :a uint8 0 []
1 :b double 0 []
2 :c string 0 []
3 :d boolean 0 []
STR
assert_equal str, @df.filter(@df.c == "nonexistent").assign(:b) { b.multiply(1) }.tdr_str
end
end

test 'assign by both args and block' do
assert_raise(DataFrameArgumentError) { @df2.assign(:key) {} } # rubocop:disable Lint/EmptyBlock

Expand Down
2 changes: 1 addition & 1 deletion test/test_group.rb
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ class GroupTest < Test::Unit::TestCase
Vectors : 3 numeric
# key type level data_preview
0 :i uint8 4 [0, 1, 2, nil], 1 nil
1 :count uint8 3 [2, 1, 2, 0]
1 :count int64 3 [2, 1, 2, 0]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou FYI I had to update this test after the change

2 :"sum(f)" double 4 [1.1, 2.2, NaN, nil], 1 NaN, 1 nil
STR
assert_equal str, @df.group(:i) { [count(:i, :f, :b), sum] }.tdr_str(tally: 0)
Expand Down
Loading