Skip to content

Conversation

@hsivonen
Copy link
Member

@hsivonen hsivonen commented Nov 6, 2025

The intent is to allow the normalizer to assume that these tries are fast in the baked data mode.

This changeset removes the ability to generate serde-loadable data with small-mode tries for NFD and NFKD. Per previous discussion, we can add the ability to generate small tries for these back if there are users of ICU4X who really need that mode for these tries.

Fixes #6836

The intent is to allow the normalizer to assume that these tries are
fast in the baked data mode.

This changeset removes the ability to generate serde-loadable data
with small-mode tries for NFD and NFKD. Per previous discussion, we
can add the ability to generate small tries for these back if there
are users of ICU4X who really need that mode for these tries.

Fixes unicode-org#6836
if $file_name == "nfd" || $file_name == "nfkd" {
TrieType::Fast
} else {
self.trie_type()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there no change here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps icuexportdata has a bug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah CI is failing, these files are wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well.

I bet the new data is generated off of the real files, it's just the test data that didn't get updated correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

observation: 23% and 18% size increase

@Manishearth
Copy link
Member

If you have perf numbers could you include them in the PR?

self.icuexport()?.read_and_parse_toml(&format!(
"norm/{}/{}.toml",
self.trie_type(),
if $file_name == "nfd" || $file_name == "nfkd" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: link to issue or otherwise document why

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps icuexportdata has a bug?

@@ -1,7 +1,7 @@
normalizer/nfc/v1, <singleton>, 5332B, 5310B, dd048e68b718a993
normalizer/nfd/data/v1, <singleton>, 28208B, 28144B, a37d3c274c030323
normalizer/nfd/data/v1, <singleton>, 34912B, 34848B, 933447e5c056e49c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that the toml files didn't change but the generated data did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ship fast tries in icu_normalizer_data by default

3 participants