Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44555: [C++][Compute] Allow casting struct to bigger nullable struct #44587

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Oct 30, 2024

Rationale for this change

Sometimes its useful to add a column full of nulls in a cast. Currently this is supported in top level columns but not inside structs. Example where this is important: delta-io/delta-rs#1610

What changes are included in this PR?

Add support for filling in columns full of null for nullable struct fields.
I've gone for a fairly minimal change that achieves what I needed but I wonder if there should be a more significant change so that this casting is done by field name and ignore the field order.

Are these changes tested?

Yes. The expected behaviour in several existing tests has been altered and a couple of new tests have been added.

I also rolled out a custom build with this change internally because it suddenly became a critical problem.

Are there any user-facing changes?

Yes. There are scenarios that previously failed with struct fields don't match or are in the wrong order but now succeed after filling in nulls.

Copy link

⚠️ GitHub issue #44555 has been automatically assigned in GitHub to PR creator.

@Tom-Newton
Copy link
Contributor Author

I'm not confident that I've taken the right approach here but I think its ready for review.

@Tom-Newton Tom-Newton marked this pull request as ready for review October 30, 2024 18:50
@Tom-Newton
Copy link
Contributor Author

Ok, it looks like there is at least one more tests I need to update. I can probably sort that tomorrow.

@@ -2328,7 +2328,7 @@ DatasetAndBatches MakeNestedDataset() {
field("b", boolean()),
field("c", struct_({
field("d", int64()),
field("e", float64()),
field("e", int64()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this was not supposed to be different from in the physical schema. This data is used to test if a virtual column will be materialised so I guess it shouldn't be testing int64 -> float64 casting at the same time.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Nov 1, 2024
@Tom-Newton
Copy link
Contributor Author

Sorry for the direct ping but @lidavidm please could you review when you get a chance. I assume you are the right person to review this?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, thanks. CC @bkietz for some more eyes?

// is definitely no in_field with matching name.

// -2 is a sentinel value indicating fill with null.
fields_to_select[out_field_index++] = -2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add something like constexpr int kFillNullSentinel = -2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but I wasn't really confident on the best place to define the constant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be placed inside the method definition itself!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I moved it inside. As you can probably tell I am a C++ noob.

cpp/src/arrow/compute/kernels/scalar_cast_nested.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 14, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 14, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Nov 14, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants