Skip to content

perf: Optimize string_to_array for scalar args#21131

Open
neilconway wants to merge 3 commits intoapache:mainfrom
neilconway:neilc/optimize-string-to-array
Open

perf: Optimize string_to_array for scalar args#21131
neilconway wants to merge 3 commits intoapache:mainfrom
neilconway:neilc/optimize-string-to-array

Conversation

@neilconway
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

When the delimiter (and null string, if supplied) are scalars, we can implement string_to_array more efficiently. In particular, we can construct a memmem::Finder and use it to search for delimiters more efficiently.

This PR implements this optimization; it also fixes a place where we were allocating an intermediate String for every character when the delimiter is NULL. (This isn't a common case but worth fixing.)

Benchmarks (M4 Max):

  single_char_delim/5:    34.8 µs  (was  61.1 µs)  -43%
  single_char_delim/20:  145.1 µs  (was 220.7 µs)  -34%
  single_char_delim/100: 679.4 µs  (was   1.04 ms) -35%

  multi_char_delim/5:    41.7 µs  (was  56.7 µs)  -27%
  multi_char_delim/20:  158.9 µs  (was 185.1 µs)  -14%
  multi_char_delim/100: 731.4 µs  (was 858.3 µs)  -15%

  with_null_str/5:    43.1 µs  (was  68.7 µs)  -37%
  with_null_str/20:  179.3 µs  (was 244.3 µs)  -27%
  with_null_str/100: 895.8 µs  (was   1.16 ms) -23%

  null_delim/5:    17.4 µs  (was  64.1 µs)  -73%
  null_delim/20:   63.0 µs  (was 233.4 µs)  -73%
  null_delim/100: 280.2 µs  (was   1.12 ms) -75%

  columnar_delim/5:    65.2 µs  (was  60.2 µs)  +8%
  columnar_delim/20:  217.2 µs  (was 224.1 µs)  -3%
  columnar_delim/100:   1.02 ms  (was   1.05 ms) -3%

What changes are included in this PR?

  • Add benchmark for string_to_array
  • Implement optimizations described above
  • Refactor columnar (fallback) path to get rid of a lot of type dispatch boilerplate
  • Improve SLT test coverage for the "columnar string, scalar other-args" case

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 23, 2026
Ok(())
}

/// String_to_array SQL function
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move some code around to group the string_to_array functions together in the file, sorry for the noisy diff.

list_builder.append(true);
}
(Some(string), None) => {
string.chars().map(|c| c.to_string()).for_each(|c| {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the inefficient NULL delimiter handling mentioned in the PR summary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize string_to_array for scalar args

1 participant