Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize bloomfilter issue 7346 #7494

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Romain-E
Copy link

Context

This PR optimizes the calculation of the number of hash functions and bits used in BloomFilter, reducing redundant computations and improving readability.

Changes

Implemented a more efficient algorithm for optimalNumOfBits
Replaced hash calculations in optimalNumOfHashFunctions
Updated tests to reflect these changes

Impact

Improves performance when using BloomFilter.

Tests affected

Tests have been modified to validate the new calculation methods.

Romain-E and others added 2 commits October 27, 2024 13:48
Refactored optimalNumOfBits and optimalNumOfHashFunctions to improve efficiency and clarity.
static int optimalNumOfHashFunctions(long n, long m) {
// (m / n) * log(2), but avoid truncation due to division!
return max(1, (int) Math.round((double) m / n * Math.log(2)));
static int optimalNumOfHashFunctions(double p) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here replaces the optimalNumOfHashFunctions method, which originally took parameters n (expected insertions) and m (total number of bits), with a new version that instead uses p (desired false positive probability). While this new approach aligns with cases where p is known, it reduces flexibility for use cases where m and n are more readily available or preferred for direct configuration.

To maintain backward compatibility and allow both methods of calculation, it would be more effective to overload optimalNumOfHashFunctions rather than replacing it. Overloading the method would allow users to specify either n and m or p as needed, based on their use case, without introducing breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package=hash type=performance Related to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants