-
Notifications
You must be signed in to change notification settings - Fork 286
feat(hpu): normality & expected value checks for HPU parameters #2441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
tfhe/src/core_crypto/algorithms/test/noise_distribution/lwe_hpu_noise.rs
Show resolved
Hide resolved
| max_msg_val: u64, | ||
| nb_tests: usize, | ||
| nb_pbs_per_test: usize, | ||
| check_variance: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a boolean, I will have used an enum like:
enum HpuNoiseMode {
Variance,
Normality,
}
And replace if stmt by match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has not been easy 😓 but I think it is done
|
To be clear, I had occurences of failure with expected value score check with ~7% probability. But most of the time it works well and is below 6.25%. |
when do those tests run ? if they fail with such high probability we can't have them like this in CI, I know they are statistical tests but we can't have very flaky tests |
|
At this point, these tests fails very rarely on my local machine but a bit more often in the CI runs.
|
|
Which one is failing ? Normality check or noise measurement ? |
|
noise measurement are ok and stable |
|
Well there at least something to discuss with R&D and it is that : the normality is degraded after the 21 bits keyswitch and we know quantization on low bit width is a topic, if we expect 5% and can get +50% it is something we need to discuss theoretically |
|
The normality being unachievable after mod switch because of normalization requires us to have normality before the mod switch apparently, so if we are seeing trouble there as well it’s something worth verifying. if it’s after the PBS it’s even more concerning |
|
both check are done after KS (and *nu before): |
Ok, still a bit surprising I thought the mean issue was after the MS there may be something I'm missing on the average value stuff |
|
These results are expected random fluctuations. Indeed we expect on average a 5% failure rate. If I remember correctly the test is performed 100 times and samples.len() = 100. In this setting, for a perfect Gaussian distribution with zero mean, the probability of observing a failure rate larger than 7% is roughly 1/4. We could set a bound at 12% failure rate of the test, then the probability of a perfect zero mean Gaussian to excess this 12% threshold would be 0.4% which seems acceptable for a CI failure rate. We could go up to 13% threshold then the probability of false positive would be 0.15%. Sorry I should have been more precise about the expected results of the test in the first place |
|
Just to add that the test will still be powerful. In this same setting (100 tests with 100 samples each), if instead of zero the mean of the Gaussian is std_dev/10 (which would affect the p_fail by less than a bit), the failure rate of the test will be 17%, which will be catch almost always having set the threshold to 13% |
IceTDrinker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a few nits, otherwise looks to be in line with what Mathieu proposed for the tests, to check if the 8% bound should be changed to 13% given his recommendations, I'm guessing @mballandras that the test for the average value is the one we will want to add for other primitives as well, is there a write-up about the specs of the test ? will be of interest when we update older core noise tests for the CPU part
Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 4 unresolved discussions (waiting on @Bapt-Roux)
tfhe/src/core_crypto/algorithms/test/noise_distribution/lwe_hpu_noise.rs line 128 at r4 (raw file):
ct_width: 64, ksk_width: 21, //norm2: 8,
comment to remove ?
tfhe/src/core_crypto/algorithms/test/noise_distribution/lwe_hpu_noise.rs line 735 at r4 (raw file):
create_parameterized_test_hpu!( hpu_noise_distribution { //HPU_TEST_PARAMS_4_BITS_NATIVE_U64,
commented param sets could be removed ?
tfhe/src/core_crypto/algorithms/test/noise_distribution/lwe_hpu_noise.rs line 752 at r4 (raw file):
create_parameterized_test_hpu!( hpu_noise_distribution { //HPU_TEST_PARAMS_4_BITS_HPU_64_KS_21_132_GAUSSIAN,
same here
So far the write-up about the specs is just a slack thread in research-fpga, I will transfer it to you. I can do a more precise spec detailing the 13% bound |
…ormality & expected value checks
…e of both normality and expected value score and raise expected value score check acceptable failure rate to 8%
88ab60b to
198c626
Compare
|
removed useless commented lines, rebased & reduced to 2 commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normality check failed, this is too flaky to merge as is
|
The normality test is similar to the zero-mean test, it is a null hypothesis test at 5% level of significance. If it is performed 100 times and we set the limit at 13% failure we should see only 0.15% of false positive |
|
Agreed, what is really strange is that I get failure rate on normality check below 5% on my laptop and at the same time 7% on the CI. I have the impression failure rate is always worse on CI... Could it be because I do not re-encrypt the msg at each run of iter B? |
|
@mballandras you mean I should check that both failure probability are below 13%? |
|
@pgardratzama do you run strictly the same tests between CI and laptop ? you could try to seed the test to compare both CI and laptop runs, given it is the ntt PBS the polynomial mul algo should be the same between both |
also adds noise measurement tests for 2M128 HPU parameter set
This change is