Remove CastColumnExpr and custom_file_casts example; unify on field-aware CastExpr#21563
Remove CastColumnExpr and custom_file_casts example; unify on field-aware CastExpr#21563kosiew wants to merge 8 commits intoapache:mainfrom
Conversation
Remove active CastColumnExpr from mod.rs and delete cast_column.rs. Migrate important CastColumnExpr regression coverage to field-aware CastExpr tests in cast.rs. Eliminate custom_file_casts wiring and documentation from main.rs and README.md, along with custom_file_casts.rs. Add SQL struct cast coverage in struct.slt and provide migration guidance in 54.0.0.md. Clean up adapter test helper name in schema_rewriter.rs.
Eliminate field_aware_cast_primitive_array and field_aware_cast_nested_struct_array as their behaviors are adequately tested by existing generic cast tests and SQL-level struct cast validations. This streamlines the test suite and reduces redundancy.
Removed redundant checks for explicit scalar struct field reordering, missing nullable target fields, and nested explicit struct cast reordering. These were previously addressed by the explicit struct cast and field-reordering sections. Retained the compact array explicit-cast case for valuable end-to-end coverage of unifying partially overlapping struct schemas.
Introduce field_aware_cast_nested_struct_array coverage in cast.rs to handle recursive nested struct casting and null-filled missing nested fields. Update the upgrade guide in 54.0.0.md to recommend using the public CastExpr::new_with_target_field(...) API. Also, add adapter_serialization to the custom_data_source usage string in main.rs for improved clarity.
Combine three field-aware cast tests into a single table-style test to enhance readability and maintainability. The new test, field_aware_cast_preserves_target_field_semantics, covers all necessary aspects including target field name, data type, metadata, target nullability, and effective expression nullability for both child-nullable and target-nullable cases. Retain certain tests separately to ensure distinct CastColumnExpr migration coverage.
Add private test helper for struct cast evaluation, reuse make_struct_array, switch Boolean downcast to as_boolean_array, and inline one-use field setup. Tighten wording in the CastColumnExpr migration.
adriangb
left a comment
There was a problem hiding this comment.
I think we should keep the example, it seems like it would be a 1 line change to make it work with CastExpr.
| // Example showing how to implement custom casting rules to adapt file schemas. | ||
| // This example enforces that casts must be strictly widening: if the file type is Int64 and the table type is Int32, it will error |
There was a problem hiding this comment.
Is this example not valuable anymore? Can't we just change the expression used?
There was a problem hiding this comment.
Yes, that is a fair point. The intent of deleting it was to avoid preserving an example that taught downstream users to branch on both CastExpr and the now-removed CastColumnExpr. But the example itself is still useful because it demonstrates a custom PhysicalExprAdapter policy for file-schema casts.
I restored the example and updated it to use the unified cast representation.
- Restored example `custom_file_casts` that demonstrates how to implement custom casting rules in DataFusion. - Updated the README to include the example. - Modified `main.rs` to include the `custom_file_casts` module in the execution options.
…stsPhysicalExprAdapter This commit modifies the method used to check for valid casts within the `CustomCastsPhysicalExprAdapter` implementation. It replaces `CastExpr::check_bigger_cast` with `cast.is_bigger_cast` for improved clarity and maintainability.
|
@kosiew one thought that occurs to me - is there some sense in leaving the symbol around for another release marked as deprecated? Thinking through it there's basically two uses of this:
So with that said I think it makes sense to just remove this completely as this PR does. But I do want to maybe get an okay from @comphead before we merge knowing that they've dealt with a lot of issues w.r.t. |
|
Thanks @adriangb for pinging me, in Comet we would need to adapt to the changes, if I understood the changes we would need to use |
Yes I just checked and it seems like you’re actually using neither. Searching the codebase for You do use So in summary I think this change shouldn’t be a problem for you. |
|
Added the |
Which issue does this PR close?
Rationale for this change
This PR completes the cast unification effort by removing the remaining public and example surface of
CastColumnExpr. Keeping bothCastExprandCastColumnExprintroduced duplication in casting logic, increased maintenance overhead, and made it easier for downstream code to reintroduce divergent behavior.With the introduction of field-aware
CastExpr(vianew_with_target_field), all functionality previously covered byCastColumnExprcan now be expressed through a single, consistent abstraction. This simplifies the mental model for casting, ensures consistent behavior across execution paths, and aligns documentation and examples with the unified design.Additionally, this change ensures that test coverage and examples reflect the final architecture, reducing confusion for contributors and users.
What changes are included in this PR?
Removal of
CastColumnExprdatafusion/physical-expr/src/expressions/cast_column.rs.expressions/mod.rs.Migration to field-aware
CastExprCastExpr.data_type(&schema)withtarget_field().data_type()where appropriate.Example updates
custom_file_castsexample to only handleCastExpr.CastExprandCastColumnExpr.Test consolidation and improvements
Migrated struct, nested struct, and scalar struct cast tests into
cast.rs.Added comprehensive field-aware cast tests covering:
Refactoring and cleanup
assert_cast_column→assert_cast_input_column).Documentation updates
CastColumnExprand migration guidance.Are these changes tested?
Yes.
Existing tests were updated to reflect the unified cast model.
Coverage from
cast_column.rswas preserved and migrated intocast.rs.New tests validate:
CastExprsemanticsAdapter and schema rewriting tests were updated and remain green.
Are there any user-facing changes?
Yes.
Breaking change:
CastColumnExprhas been removed from the public API.Users must migrate to
CastExpr:CastExpr::new_with_target_field(...)for field-aware casting.target_field().expr().Documentation has been updated with migration guidance.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.