Skip to content

Conversation

@pgardratzama
Copy link
Contributor

@pgardratzama pgardratzama commented Jun 13, 2025

also adds noise measurement tests for 2M128 HPU parameter set


This change is Reviewable

@pgardratzama pgardratzama requested a review from Bapt-Roux June 13, 2025 16:04
@cla-bot cla-bot bot added the cla-signed label Jun 13, 2025
max_msg_val: u64,
nb_tests: usize,
nb_pbs_per_test: usize,
check_variance: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a boolean, I will have used an enum like:

enum HpuNoiseMode {
 Variance,
 Normality,
}

And replace if stmt by match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has not been easy 😓 but I think it is done

@pgardratzama
Copy link
Contributor Author

To be clear, I had occurences of failure with expected value score check with ~7% probability. But most of the time it works well and is below 6.25%.

@IceTDrinker
Copy link
Member

To be clear, I had occurences of failure with expected value score check with ~7% probability. But most of the time it works well and is below 6.25%.

when do those tests run ? if they fail with such high probability we can't have them like this in CI, I know they are statistical tests but we can't have very flaky tests

@pgardratzama
Copy link
Contributor Author

At this point, these tests fails very rarely on my local machine but a bit more often in the CI runs.
I do not know how to define a test that is not always run.
I have been told that 7% of failure rate on expected value is kind of ok so there are 2 ways possible:

  1. increasing acceptable failure rate from 6.25% to 7.5%
  2. make these tests optional or comment them out

@IceTDrinker
Copy link
Member

Which one is failing ? Normality check or noise measurement ?

@pgardratzama
Copy link
Contributor Author

noise measurement are ok and stable
normality tests are checking 2 things: normality (stable under 6.25% failure rate), expected value near enough 0 with a score (failure rate between 3 and 7%)
I will just raise expected failure rate of expected value score and we will continue like that

@IceTDrinker
Copy link
Member

Well there at least something to discuss with R&D and it is that : the normality is degraded after the 21 bits keyswitch and we know quantization on low bit width is a topic, if we expect 5% and can get +50% it is something we need to discuss theoretically

@IceTDrinker
Copy link
Member

The normality being unachievable after mod switch because of normalization requires us to have normality before the mod switch apparently, so if we are seeing trouble there as well it’s something worth verifying.

if it’s after the PBS it’s even more concerning

@pgardratzama
Copy link
Contributor Author

both check are done after KS (and *nu before):
normality_test_f64() is fine with failure probability < 5%
what is not always < 5% is the expected value score: mean(samples) / std_dev(samples) * sqrt(samples.len()) € [-1.96..1.96]
@mballandras did not seem surprised by this failure probability

@IceTDrinker
Copy link
Member

both check are done after KS (and *nu before): normality_test_f64() is fine with failure probability < 5% what is not always < 5% is the expected value score: mean(samples) / std_dev(samples) * sqrt(samples.len()) € [-1.96..1.96] @mballandras did not seem surprised by this failure probability

Ok, still a bit surprising I thought the mean issue was after the MS there may be something I'm missing on the average value stuff

@mballandras
Copy link

These results are expected random fluctuations. Indeed we expect on average a 5% failure rate. If I remember correctly the test is performed 100 times and samples.len() = 100. In this setting, for a perfect Gaussian distribution with zero mean, the probability of observing a failure rate larger than 7% is roughly 1/4. We could set a bound at 12% failure rate of the test, then the probability of a perfect zero mean Gaussian to excess this 12% threshold would be 0.4% which seems acceptable for a CI failure rate. We could go up to 13% threshold then the probability of false positive would be 0.15%.

Sorry I should have been more precise about the expected results of the test in the first place

@mballandras
Copy link

Just to add that the test will still be powerful. In this same setting (100 tests with 100 samples each), if instead of zero the mean of the Gaussian is std_dev/10 (which would affect the p_fail by less than a bit), the failure rate of the test will be 17%, which will be catch almost always having set the threshold to 13%

Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a few nits, otherwise looks to be in line with what Mathieu proposed for the tests, to check if the 8% bound should be changed to 13% given his recommendations, I'm guessing @mballandras that the test for the average value is the one we will want to add for other primitives as well, is there a write-up about the specs of the test ? will be of interest when we update older core noise tests for the CPU part

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 4 unresolved discussions (waiting on @Bapt-Roux)


tfhe/src/core_crypto/algorithms/test/noise_distribution/lwe_hpu_noise.rs line 128 at r4 (raw file):

    ct_width: 64,
    ksk_width: 21,
    //norm2: 8,

comment to remove ?


tfhe/src/core_crypto/algorithms/test/noise_distribution/lwe_hpu_noise.rs line 735 at r4 (raw file):

create_parameterized_test_hpu!(
    hpu_noise_distribution {
        //HPU_TEST_PARAMS_4_BITS_NATIVE_U64,

commented param sets could be removed ?


tfhe/src/core_crypto/algorithms/test/noise_distribution/lwe_hpu_noise.rs line 752 at r4 (raw file):

create_parameterized_test_hpu!(
    hpu_noise_distribution {
        //HPU_TEST_PARAMS_4_BITS_HPU_64_KS_21_132_GAUSSIAN,

same here

@mballandras
Copy link

Thanks a few nits, otherwise looks to be in line with what Mathieu proposed for the tests, to check if the 8% bound should be changed to 13% given his recommendations, I'm guessing @mballandras that the test for the average value is the one we will want to add for other primitives as well, is there a write-up about the specs of the test ? will be of interest when we update older core noise tests for the CPU part

So far the write-up about the specs is just a slack thread in research-fpga, I will transfer it to you. I can do a more precise spec detailing the 13% bound

…e of both normality and expected value score

and raise expected value score check acceptable failure rate to 8%
@pgardratzama pgardratzama force-pushed the hw-team/pg/normality_expvalue_tests branch from 88ab60b to 198c626 Compare June 19, 2025 20:11
@pgardratzama
Copy link
Contributor Author

removed useless commented lines, rebased & reduced to 2 commits

Copy link
Member

@IceTDrinker IceTDrinker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normality check failed, this is too flaky to merge as is

@mballandras
Copy link

The normality test is similar to the zero-mean test, it is a null hypothesis test at 5% level of significance. If it is performed 100 times and we set the limit at 13% failure we should see only 0.15% of false positive

@pgardratzama
Copy link
Contributor Author

Agreed, what is really strange is that I get failure rate on normality check below 5% on my laptop and at the same time 7% on the CI. I have the impression failure rate is always worse on CI...
Is this because of what does the test:

iter A (100) {
  - generate keys
  - encrypt msg
  - iter B (160) {
    - * nu
    - KS
    - sample noise in vec
    - PBS
  }
  check normality & expected value
}

Could it be because I do not re-encrypt the msg at each run of iter B?

@pgardratzama
Copy link
Contributor Author

@mballandras you mean I should check that both failure probability are below 13%?

@IceTDrinker
Copy link
Member

@pgardratzama do you run strictly the same tests between CI and laptop ?

you could try to seed the test to compare both CI and laptop runs, given it is the ntt PBS the polynomial mul algo should be the same between both

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants