-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
The runtime has a great and fast function for UTF-8 validation: Utf8Utility.GetPointerToFirstInvalidByte. But we might be able to do better.
We implemented in C#, the 'lookup' UTF-8validation algorithm from
- Validating UTF-8 In Less Than One Instruction Per Byte, Software: Practice and Experience 51 (5), 2021
The algorithm is used by Oracle GraalVM, the Node.js and Bun JavaScript runtimes. For example, Node.js is capable of validating Arabic or Chinese strings at 17 GB/s on an 2 GHz Intel server (from JavaScript).
We adapted it so that we can match exactly the functionality of Utf8Utility.GetPointerToFirstInvalidByte with a function called SimdUnicode.UTF8.GetPointerToFirstInvalidByte. It is available on GitHub at simdutf/SimdUnicode. We have good tests, and decent benchmarks. We use .NET's excellent runtime dispatching functionality to select the best function (SSE4.2, AVX2, AVX-512, fallback, NEON). We used @EgorBo's Disasmo to help tune the code, although we make no claim that it is optimal (it probably is not).
Intel Ice Lake results:
| data set | SimdUnicode AVX-512 (GB/s) | .NET speed (GB/s) | speed up |
|---|---|---|---|
| Twitter.json | 29 | 12 | 2.4 x |
| Arabic-Lipsum | 12 | 2.3 | 5.2 x |
| Chinese-Lipsum | 12 | 3.9 | 3.0 x |
| Emoji-Lipsum | 12 | 0.9 | 13 x |
| Hebrew-Lipsum | 12 | 2.3 | 5.2 x |
| Hindi-Lipsum | 12 | 2.1 | 5.7 x |
| Japanese-Lipsum | 10 | 3.5 | 2.9 x |
| Korean-Lipsum | 10 | 1.3 | 7.7 x |
| Latin-Lipsum | 76 | 76 | --- |
| Russian-Lipsum | 12 | 1.2 | 10 x |
Twitter.json
SimdUnicode ▏ 29 GB/s █████████████████████████
.NET Runtime ▏ 12 GB/s ██████████▎
Arabic-Lipsum
SimdUnicode ▏ 12 GB/s █████████████████████████
.NET Runtime ▏ 2.3 GB/s ████▊
Chinese-Lipsum
SimdUnicode ▏ 12 GB/s █████████████████████████
.NET Runtime ▏ 3.9 GB/s ████████▏
Emoji-Lipsum
SimdUnicode ▏ 12 GB/s █████████████████████████
.NET Runtime ▏ 0.9 GB/s █▉
Japanese-Lipsum
SimdUnicode ▏ 10 GB/s █████████████████████████
.NET Runtime ▏ 3.5 GB/s ████████▊
Apple M2 results:
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) | speed up |
|---|---|---|---|
| Twitter.json | 25 | 14 | 1.8 x |
| Arabic-Lipsum | 7.4 | 3.5 | 2.1 x |
| Chinese-Lipsum | 7.4 | 4.8 | 1.5 x |
| Emoji-Lipsum | 7.4 | 2.5 | 3.0 x |
| Hebrew-Lipsum | 7.4 | 3.5 | 2.1 x |
| Hindi-Lipsum | 7.3 | 3.0 | 2.4 x |
| Japanese-Lipsum | 7.3 | 4.6 | 1.6 x |
| Korean-Lipsum | 7.4 | 1.8 | 4.1 x |
| Latin-Lipsum | 87 | 38 | 2.3 x |
| Russian-Lipsum | 7.4 | 2.7 | 2.7 x |
On a Neoverse V1 (Graviton 3), our validation function is 1.3 to over five times
faster than the standard library.
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) | speed up |
|---|---|---|---|
| Twitter.json | 14 | 8.7 | 1.4 x |
| Arabic-Lipsum | 4.2 | 2.0 | 2.1 x |
| Chinese-Lipsum | 4.2 | 2.6 | 1.6 x |
| Emoji-Lipsum | 4.2 | 0.8 | 5.3 x |
| Hebrew-Lipsum | 4.2 | 2.0 | 2.1 x |
| Hindi-Lipsum | 4.2 | 1.6 | 2.6 x |
| Japanese-Lipsum | 4.2 | 2.4 | 1.8 x |
| Korean-Lipsum | 4.2 | 1.3 | 3.2 x |
| Latin-Lipsum | 42 | 17 | 2.5 x |
| Russian-Lipsum | 4.2 | 0.95 | 4.4 x |
On a Qualcomm 8cx gen3 (Windows Dev Kit 2023), we get roughly the same relative performance
boost as the Neoverse V1.
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) | speed up |
|---|---|---|---|
| Twitter.json | 17 | 10 | 1.7 x |
| Arabic-Lipsum | 5.0 | 2.3 | 2.2 x |
| Chinese-Lipsum | 5.0 | 2.9 | 1.7 x |
| Emoji-Lipsum | 5.0 | 0.9 | 5.5 x |
| Hebrew-Lipsum | 5.0 | 2.3 | 2.2 x |
| Hindi-Lipsum | 5.0 | 1.9 | 2.6 x |
| Japanese-Lipsum | 5.0 | 2.7 | 1.9 x |
| Korean-Lipsum | 5.0 | 1.5 | 3.3 x |
| Latin-Lipsum | 50 | 20 | 2.5 x |
| Russian-Lipsum | 5.0 | 1.2 | 5.2 x |
On a Neoverse N1 (Graviton 2), our validation function is up to over three times
faster than the standard library.
| data set | SimdUnicode speed (GB/s) | .NET speed (GB/s) | speed up |
|---|---|---|---|
| Twitter.json | 7.8 | 5.7 | 1.4 x |
| Arabic-Lipsum | 2.5 | 0.9 | 2.8 x |
| Chinese-Lipsum | 2.5 | 1.8 | 1.4 x |
| Emoji-Lipsum | 2.5 | 0.7 | 3.6 x |
| Hebrew-Lipsum | 2.5 | 0.9 | 2.7 x |
| Hindi-Lipsum | 2.3 | 1.0 | 2.3 x |
| Japanese-Lipsum | 2.4 | 1.7 | 1.4 x |
| Korean-Lipsum | 2.5 | 1.0 | 2.5 x |
| Latin-Lipsum | 23 | 13 | 1.8 x |
| Russian-Lipsum | 2.3 | 0.7 | 3.3 x |
Importantly, there is no patent involved, and no licensing issue. We are eager for reviews, feedback and so forth.
Note that we have other fast Unicode algorithms that could be implemented in C#, including fast transcoding functions. UTF-8 validation is simply the simplest non-trivial case.
This is joint work with @Nick-Nuon
Further reading: Validating gigabytes of Unicode strings per second… in C#? (blog post)