Always generate fast-mode data for NFD and NFKD tries #7222

hsivonen · 2025-11-06T12:12:29Z

The intent is to allow the normalizer to assume that these tries are fast in the baked data mode.

This changeset removes the ability to generate serde-loadable data with small-mode tries for NFD and NFKD. Per previous discussion, we can add the ability to generate small tries for these back if there are users of ICU4X who really need that mode for these tries.

Fixes #6836

The intent is to allow the normalizer to assume that these tries are fast in the baked data mode. This changeset removes the ability to generate serde-loadable data with small-mode tries for NFD and NFKD. Per previous discussion, we can add the ability to generate small tries for these back if there are users of ICU4X who really need that mode for these tries. Fixes unicode-org#6836

robertbastian · 2025-11-06T12:42:59Z

provider/source/src/normalizer/mod.rs

+                        if $file_name == "nfd" || $file_name == "nfkd" {
+                            TrieType::Fast
+                        } else {
+                            self.trie_type()


suggestion: document that the trie_type does not apply to these.

https://unicode-org.github.io/icu4x/rustdoc/icu_provider_source/struct.SourceDataProvider.html#method.with_fast_tries

robertbastian · 2025-11-06T12:43:21Z

provider/source/tests/data/icuexport/norm/fast/nfd.toml

why is there no change here?

Perhaps icuexportdata has a bug?

ah CI is failing, these files are wrong

Ah, well.

I bet the new data is generated off of the real files, it's just the test data that didn't get updated correctly.

robertbastian · 2025-11-06T12:45:42Z

provider/data/normalizer/fingerprints.csv

observation: 23% and 18% size increase

Manishearth · 2025-11-06T15:53:46Z

If you have perf numbers could you include them in the PR?

Manishearth · 2025-11-06T21:01:19Z

provider/source/src/normalizer/mod.rs

                    self.icuexport()?.read_and_parse_toml(&format!(
                        "norm/{}/{}.toml",
-                        self.trie_type(),
+                        if $file_name == "nfd" || $file_name == "nfkd" {


nit: link to issue or otherwise document why

Manishearth · 2025-11-06T21:04:11Z

provider/source/tests/data/icuexport/norm/fast/nfd.toml

Perhaps icuexportdata has a bug?

Manishearth · 2025-11-06T21:04:35Z

provider/data/normalizer/fingerprints.csv

@@ -1,7 +1,7 @@
 normalizer/nfc/v1, <singleton>, 5332B, 5310B, dd048e68b718a993
-normalizer/nfd/data/v1, <singleton>, 28208B, 28144B, a37d3c274c030323
+normalizer/nfd/data/v1, <singleton>, 34912B, 34848B, 933447e5c056e49c


I'm surprised that the toml files didn't change but the generated data did.

hsivonen requested review from a team, Manishearth, robertbastian and sffc as code owners November 6, 2025 12:12

hsivonen added 2 commits November 6, 2025 14:18

Fix mappings in provider/source/src/tests/data.rs

76cf7d3

Update testdata globs

5a65b06

robertbastian reviewed Nov 6, 2025

View reviewed changes

Manishearth reviewed Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Always generate fast-mode data for NFD and NFKD tries #7222

Always generate fast-mode data for NFD and NFKD tries #7222

Uh oh!

hsivonen commented Nov 6, 2025

Uh oh!

robertbastian Nov 6, 2025

Uh oh!

robertbastian Nov 6, 2025

Uh oh!

Manishearth Nov 6, 2025

Uh oh!

robertbastian Nov 6, 2025

Uh oh!

Manishearth Nov 6, 2025

Uh oh!

robertbastian Nov 6, 2025

Uh oh!

Manishearth commented Nov 6, 2025

Uh oh!

Manishearth Nov 6, 2025

Uh oh!

Manishearth Nov 6, 2025

Uh oh!

Manishearth Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Always generate fast-mode data for NFD and NFKD tries #7222

Are you sure you want to change the base?

Always generate fast-mode data for NFD and NFKD tries #7222

Uh oh!

Conversation

hsivonen commented Nov 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Manishearth commented Nov 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants