-
Notifications
You must be signed in to change notification settings - Fork 236
Always generate fast-mode data for NFD and NFKD tries #7222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The intent is to allow the normalizer to assume that these tries are fast in the baked data mode. This changeset removes the ability to generate serde-loadable data with small-mode tries for NFD and NFKD. Per previous discussion, we can add the ability to generate small tries for these back if there are users of ICU4X who really need that mode for these tries. Fixes unicode-org#6836
| if $file_name == "nfd" || $file_name == "nfkd" { | ||
| TrieType::Fast | ||
| } else { | ||
| self.trie_type() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: document that the trie_type does not apply to these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there no change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps icuexportdata has a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah CI is failing, these files are wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, well.
I bet the new data is generated off of the real files, it's just the test data that didn't get updated correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
observation: 23% and 18% size increase
|
If you have perf numbers could you include them in the PR? |
| self.icuexport()?.read_and_parse_toml(&format!( | ||
| "norm/{}/{}.toml", | ||
| self.trie_type(), | ||
| if $file_name == "nfd" || $file_name == "nfkd" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: link to issue or otherwise document why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps icuexportdata has a bug?
| @@ -1,7 +1,7 @@ | |||
| normalizer/nfc/v1, <singleton>, 5332B, 5310B, dd048e68b718a993 | |||
| normalizer/nfd/data/v1, <singleton>, 28208B, 28144B, a37d3c274c030323 | |||
| normalizer/nfd/data/v1, <singleton>, 34912B, 34848B, 933447e5c056e49c | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that the toml files didn't change but the generated data did.
The intent is to allow the normalizer to assume that these tries are fast in the baked data mode.
This changeset removes the ability to generate serde-loadable data with small-mode tries for NFD and NFKD. Per previous discussion, we can add the ability to generate small tries for these back if there are users of ICU4X who really need that mode for these tries.
Fixes #6836