crc32: optimize CRC32C computation on arm64 using 6-way parallel processing by zwtao40 · Pull Request #79047 · golang/go

zwtao40 · 2026-04-30T03:50:43Z

This PR optimizes CRC32C calculation on arm64 architecture by implementing 6-way parallel processing, achieving significant performance improvements on ARM64 processors.

Background

The current CRC32C implementation on arm64 uses single-lane computation, which cannot fully utilize the pipeline parallelism capabilities of modern ARM64 processors. The CRC32C instructions have a latency of several cycles, creating a bottleneck when processing data sequentially.

Implementation

The optimization extends the original single-lane CRC32 computation to six parallel lanes:

6-way parallel lanes: Each lane operates independently without data dependencies, allowing the processor to schedule multiple CRC32C instructions concurrently.
Carry-less multiplication for merging: After all six lanes complete their computations, the intermediate results are merged using VPMULL (carry-less multiplication) instructions for subsequent iterations until the termination condition is reached.
Threshold-based dispatch: The parallel path is activated for data sizes >= 1024 bytes, ensuring optimal performance for both small and large buffers.
Loop unrolling: The inner loop processes 4 iterations (64 bytes per lane per iteration), totaling 384 bytes per loop cycle across all 6 lanes.

Technical Details

Uses registers R9, R1-R5 for 6 parallel CRC32C accumulators
Pre-computed constants (R1-R5) for carry-less multiplication merging
Leverages ARM64 NEON VPMULL instruction for efficient result combination
Processes 1024 bytes per large_loop iteration before merging

Performance Benchmark

Tested on Huawei Kunpeng 920 (ARMv8.2-A):

4K: 15 GB/s -> 34 GB/s (+126%)
8K: 14 GB/s -> 35 GB/s (+150%)

The optimization provides approximately 2.3x throughput improvement for typical buffer sizes.

Compatibility

Requires ARM64 architecture with CRC32 and NEON extensions
Falls back to the original sequential path for buffers < 1024 bytes
No changes to API or behavior, fully backward compatible

Testing

All existing crc32 tests pass
Benchmark results are reproducible across multiple runs

Updates #79052

…essing This change optimizes CRC32C calculation on arm64 architecture by extending the original single-lane CRC32 computation to six parallel lanes. Each lane operates independently without data dependencies. After all six lanes complete their computations, the intermediate results are merged using carry-less multiplication instructions for subsequent iterations until the termination condition is reached. This approach fully utilizes computational resources and improves instruction-level parallelism. Performance benchmark on Huawei Kunpeng 920: 4K: 15GB/s -> 34GB/s (126% improvement) 8K: 14GB/s -> 35GB/s (150% improvement) Signed-off-by: zhuwentao <1357420890@qq.com>

gopherbot · 2026-04-30T03:59:14Z

This PR (HEAD: de67d11) has been imported to Gerrit for code review.

Please visit Gerrit at https://go-review.googlesource.com/c/go/+/772322.

Important tips:

Don't comment on this PR. All discussion takes place in Gerrit.
You need a Gmail or other Google account to log in to Gerrit.
To change your code in response to feedback:
- Push a new commit to the branch used by your GitHub PR.
- A new "patch set" will then appear in Gerrit.
- Respond to each comment by marking as Done in Gerrit if implemented as suggested. You can alternatively write a reply.
- Critical: you must click the blue Reply button near the top to publish your Gerrit responses.
- Multiple commits in the PR will be squashed by GerritBot.
The title and description of the GitHub PR are used to construct the final commit message.
- Edit these as needed via the GitHub web interface (not via Gerrit or git).
- You should word wrap the PR description at ~76 characters unless you need longer lines (e.g., for tables or URLs).
See the Sending a change via GitHub and Reviews sections of the Contribution Guide as well as the FAQ for details.

gopherbot · 2026-04-30T04:11:44Z

Message from Gopher Robot:

Patch Set 1:

(1 comment)

Please don’t reply on this GitHub thread. Visit golang.org/cl/772322.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2026-04-30T04:24:27Z

Message from Gopher Robot:

Patch Set 1:

Congratulations on opening your first change. Thank you for your contribution!

Next steps:
A maintainer will review your change and provide feedback. See
https://go.dev/doc/contribute#review for more info and tips to get your
patch through code review.

Most changes in the Go project go through a few rounds of revision. This can be
surprising to people new to the project. The careful, iterative review process
is our way of helping mentor contributors and ensuring that their contributions
have a lasting impact.

During May-July and Nov-Jan the Go project is in a code freeze, during which
little code gets reviewed or merged. If a reviewer responds with a comment like
R=go1.11 or adds a tag like "wait-release", it means that this CL will be
reviewed as part of the next development cycle. See https://go.dev/s/release
for more details.

Please don’t reply on this GitHub thread. Visit golang.org/cl/772322.
After addressing review feedback, remember to publish your drafts!

gopherbot · 2026-04-30T07:15:14Z

Message from 祝文涛:

Patch Set 1:

(2 comments)

Please don’t reply on this GitHub thread. Visit golang.org/cl/772322.
After addressing review feedback, remember to publish your drafts!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crc32: optimize CRC32C computation on arm64 using 6-way parallel processing#79047

crc32: optimize CRC32C computation on arm64 using 6-way parallel processing#79047
zwtao40 wants to merge 1 commit intogolang:masterfrom
zwtao40:dev_aarch64_crc32c_optimize

zwtao40 commented Apr 30, 2026 •

edited

Loading

Uh oh!

gopherbot commented Apr 30, 2026

Uh oh!

gopherbot commented Apr 30, 2026

Uh oh!

gopherbot commented Apr 30, 2026

Uh oh!

gopherbot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zwtao40 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gopherbot commented Apr 30, 2026

Uh oh!

gopherbot commented Apr 30, 2026

Uh oh!

gopherbot commented Apr 30, 2026

Uh oh!

gopherbot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zwtao40 commented Apr 30, 2026 •

edited

Loading