perf: Optimize `split_part`, support `Utf8View` by neilconway · Pull Request #21119 · apache/datafusion

neilconway · 2026-03-23T15:55:09Z

Which issue does this PR close?

Rationale for this change

split_part currently accepts Utf8View but always returns Utf8. When given Utf8View input, it should instead return Utf8View output.

While we're at it, optimize split_part for single-character delimiters (the common case): str::split(&str) is significantly slower than str::split(char) for single-character ASCII delimiters, because the former uses a general string matching algorithm but the latter uses memchr::memchr.

Benchmark results (M4 Max):

utf8_single_char/pos_first: 141 µs → 106 µs (-25%)
utf8_single_char/pos_middle: 392 µs → 368 µs (-6%)
utf8_single_char/pos_negative: 151 µs → 116 µs (-24%)
utf8_multi_char/pos_middle: 375 µs → 362 µs (-2%, noise; this path is unchanged)
utf8view_single_char/pos_first: 143 µs → 111 µs (-22%)
utf8view_long_parts/pos_middle: 1022 µs → 469 µs (-54%)

What changes are included in this PR?

Revise split_part benchmarks to reduce redundancy and improve Utf8View coverage
Support Utf8View -> Utf8View in split_part
Refactor split_part to cleanup some redundant code
Optimize split_part for single-character delimiters
Add SLT test coverage for split_part with Utf8View input

Are these changes tested?

Yes. New tests and benchmarks added.

Are there any user-facing changes?

No.

neilconway · 2026-03-23T15:58:17Z

split_part can be optimized further; probably scalar specialization would be a nice win. But I'd like to get this PR in first to make it easier to review.

neilconway added 3 commits March 23, 2026 11:44

Revise benchmarks for split_part

5467531

split_part: Optimize, cleanup, support utf8view

d044aff

Fix clippy

bad1a74

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize `split_part`, support `Utf8View`#21119

perf: Optimize `split_part`, support `Utf8View`#21119
neilconway wants to merge 3 commits intoapache:mainfrom
neilconway:neilc/optimize-split-part

neilconway commented Mar 23, 2026 •

edited

Loading

Uh oh!

neilconway commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neilconway commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neilconway commented Mar 23, 2026 •

edited

Loading