perf: Optimize split_part, support Utf8View#21119
Open
neilconway wants to merge 3 commits intoapache:mainfrom
Open
perf: Optimize split_part, support Utf8View#21119neilconway wants to merge 3 commits intoapache:mainfrom
split_part, support Utf8View#21119neilconway wants to merge 3 commits intoapache:mainfrom
Conversation
Contributor
Author
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
split_partshould preserve Utf8View input #21117.split_partfor single-character delimiters #21118 .Rationale for this change
split_partcurrently acceptsUtf8Viewbut always returnsUtf8. When givenUtf8Viewinput, it should instead returnUtf8Viewoutput.While we're at it, optimize
split_partfor single-character delimiters (the common case):str::split(&str)is significantly slower thanstr::split(char)for single-character ASCII delimiters, because the former uses a general string matching algorithm but the latter usesmemchr::memchr.Benchmark results (M4 Max):
utf8_single_char/pos_first: 141 µs → 106 µs (-25%)utf8_single_char/pos_middle: 392 µs → 368 µs (-6%)utf8_single_char/pos_negative: 151 µs → 116 µs (-24%)utf8_multi_char/pos_middle: 375 µs → 362 µs (-2%, noise; this path is unchanged)utf8view_single_char/pos_first: 143 µs → 111 µs (-22%)utf8view_long_parts/pos_middle: 1022 µs → 469 µs (-54%)What changes are included in this PR?
split_partbenchmarks to reduce redundancy and improveUtf8ViewcoverageUtf8View->Utf8Viewinsplit_partsplit_partto cleanup some redundant codesplit_partfor single-character delimiterssplit_partwithUtf8ViewinputAre these changes tested?
Yes. New tests and benchmarks added.
Are there any user-facing changes?
No.