Skip to content

Conversation

@EeshanBembi
Copy link
Contributor

Fixes #18020

Summary

Enables concat function to concatenate arrays like array_concat while
preserving all existing string concatenation behavior.

Before:

SELECT concat([1, 2, 3], [4, 5]);
-- Result: [1, 2, 3][4, 5]  ❌

After:

  SELECT concat([1, 2, 3], [4, 5]);
  -- Result: [1, 2, 3, 4, 5]  ✅

Implementation

  • Extended concat function signature to accept array types
  • Added type detection in invoke_with_args() to delegate array operations to Arrow
    compute functions
  • Enhanced type coercion to handle mixed array types and empty arrays
  • Maintains full backward compatibility with string concatenation

Test Coverage

  • ✅ Array concatenation: [1,2] + [3,4] → [1,2,3,4]
  • ✅ Empty arrays: [1,2] + [] → [1,2]
  • ✅ Nested arrays: [[1,2]] + [[3,4]] → [[1,2],[3,4]]
  • ✅ String concatenation unchanged: 'hello' + 'world' → 'helloworld'
  • ✅ Mixed type coercion: true + 42 + 'test' → 'true42test'
  • ✅ Error handling: [1,2] + 'string' → Error

Approach Benefits

Function-level implementation vs planner replacement:

  • Cleaner architecture (single responsibility)
  • No planner complexity
  • Better performance
  • Easier testing and maintenance

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Oct 17, 2025
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate to ask this upfront, but how much of this code is LLM generated? Do you have a full understanding of what it does? I find a lot of this code quite baffling and not written in a Rust-like way.

For example in coerce_types, the comments are too verbose are state what is happening (a lot of the time providing no benefit as the code is straightforward enough in what it does) but there are no comments explaining why choices were made. There are also odd choices like defaulting to Int32 type if all inner list types are null.

Not to mention the CI checks aren't passing.

Comment on lines 192 to 198
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
use DataType::*;
let mut dt = &Utf8;
arg_types.iter().for_each(|data_type| {
if data_type == &Utf8View {
dt = data_type;

// Check if any argument is an array type
let has_array = arg_types.iter().any(|dt| {
matches!(dt, List(_) | LargeList(_) | FixedSizeList(_, _))
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mirroring whats done in coerce_types? We shouldn't duplicate the logic as the argument inputs to return_type are already coerced

Comment on lines 290 to 297
let num_rows = args
.iter()
.filter_map(|arg| match arg {
ColumnarValue::Array(array) => Some(array.len()),
_ => None,
})
.next()
.unwrap_or(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is should be available via number_rows in ScalarFunctionArgs

.next()
.unwrap_or(1);

// Convert to ArrayRef and delegate to array_concat_inner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these LLM-like comments that don't provide much benefit but just add verbosity?

Actually in this case it's wrong because it isn't delegating to array_concat_inner 🤔

@github-actions github-actions bot added documentation Improvements or additions to documentation core Core DataFusion crate labels Oct 19, 2025
@EeshanBembi EeshanBembi marked this pull request as draft October 19, 2025 20:48
@EeshanBembi
Copy link
Contributor Author

I hate to ask this upfront, but how much of this code is LLM generated? Do you have a full understanding of what it does? I find a lot of this code quite baffling and not written in a Rust-like way.

For example in coerce_types, the comments are too verbose are state what is happening (a lot of the time providing no benefit as the code is straightforward enough in what it does) but there are no comments explaining why choices were made. There are also odd choices like defaulting to Int32 type if all inner list types are null.

Not to mention the CI checks aren't passing.

Thanks for the honest review, and sorry this should have been a Draft PR. I was trying out some ideas around concat and list coercion related to issue #18020 and I did use some AI help for boilerplate while experimenting, but I do understand the code and take responsibility for it. I agree the comments read like explanations of what rather than why, the Int32 fallback for all-null inner list types was a quick experiment. I will convert this to Draft now, remove the noisy and misleading comments (including the one that says it delegates to array_concat_inner), avoid duplicating coerce_types logic in return_type since inputs are already coerced, switch to ScalarFunctionArgs::number_rows instead of inferring num_rows, refactor toward idiomatic Rust, and then ask for another review once everything is cleaned up and passing. Thanks again for the direct feedback.

@EeshanBembi EeshanBembi marked this pull request as ready for review October 19, 2025 22:00
@EeshanBembi EeshanBembi marked this pull request as draft October 19, 2025 22:05
Enable concat() to handle arrays like array_concat, returning actual array
concatenation instead of string representation. For example:
- concat([1, 2], [3, 4]) now returns [1, 2, 3, 4]
- concat("abc", 123, NULL, 456) returns "abc123456"

Implementation:
- Updated signature to variadic_any() to accept mixed types
- Added simple runtime array detection (7 lines of core logic)
- Enhanced scalar handling for non-string types
- Full backward compatibility for all string concatenation
- Comprehensive test coverage for arrays and mixed types

Fixes apache#18020
- Use direct format string interpolation
- Remove unnecessary string references
@EeshanBembi EeshanBembi force-pushed the feature/concat-array-support branch from 0ccd138 to 05fe9fd Compare October 20, 2025 14:51
@github-actions github-actions bot removed documentation Improvements or additions to documentation core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Oct 20, 2025
- Implement array concatenation for concat builtin function
- Support List, LargeList, and FixedSizeList types
- Use user_defined signature for optimal performance
- Maintain string concatenation performance characteristics
- Update optimizer test expectation for new coercion behavior
- Update information schema test for new signature

Fixes apache#18020
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 20, 2025
Resolves timeout issues in cooperative execution tests by optimizing
array concatenation performance and reducing blocking operations.

Key improvements:
- Fast path for single-row array concatenation
- Efficient multi-row processing with reduced complexity
- Better memory management and reduced allocations
- Cooperative-friendly design that avoids long-running sync operations

Fixes failing tests:
- execution::coop::agg_grouped_topk_yields
- execution::coop::sort_merge_join_yields

All functionality preserved:
- Array concatenation: concat(make_array(1,2,3), make_array(4,5)) → [1,2,3,4,5]
- String concatenation: original performance maintained
- Multi-row, null handling, and type safety preserved
- Fix clippy::uninlined_format_args warning in concat function tests
- Fix clippy::clone_on_ref_ptr warnings by using Arc::clone explicitly
- Update configs.md documentation with latest configuration settings
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

unexpected output for concat for arrays

2 participants