Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COLLECTIONS-843] Implement Layered Bloom filter #402

Merged
merged 54 commits into from
Dec 22, 2023

Conversation

Claudenw
Copy link
Contributor

@Claudenw Claudenw commented Jun 29, 2023

Implementation to fix COLLECTIONS-843.

This is a simple Layered Bloom filter implementation with complex configuration options.

A static method to create a fixed level implementation that does not merge is provided.
An example of a complex time based Bloom filter is provided in the testing code where filters are added to until they are saturated or one second has elapsed and are removed after they are 4 seconds old.

this change is in support of Kafka KIP-936
https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs

@Claudenw Claudenw self-assigned this Jun 29, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jun 29, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1df5606) 81.29% compared to head (500a252) 81.57%.
Report is 24 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #402      +/-   ##
============================================
+ Coverage     81.29%   81.57%   +0.28%     
- Complexity     4640     4742     +102     
============================================
  Files           290      295       +5     
  Lines         13545    13751     +206     
  Branches       2003     2022      +19     
============================================
+ Hits          11011    11217     +206     
  Misses         1933     1933              
  Partials        601      601              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@garydgregory
Copy link
Member

Hi @Claudenw
I took the liberty to fix the Javadocs that were failing the build. Make sure you pull this branch before editing.

PS: You can detect all failures before pushing by running the default Maven goal on the command line with a simple mvn invocation ;-)

@Claudenw
Copy link
Contributor Author

mvn doesn't catch all the issues :( I just spend a number of hours working out new tests for coverage changes.

@aherbert aherbert changed the title COMMONS-843: Implement Layered Bloom filter COLLECTIONS-843: Implement Layered Bloom filter Jul 4, 2023
@Claudenw
Copy link
Contributor Author

Claudenw commented Sep 3, 2023

@garydgregory @aherbert

Gentlemen,

Can I get eyes on the new code please. I think it is correct and fixes all the previously identified issues.

@Claudenw
Copy link
Contributor Author

@garydgregory @aherbert nudge

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK here. @aherbert ?

Copy link
Contributor

@aherbert aherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted a few changes. I did not look at the test code. I will do so in the next few days.

@Claudenw
Copy link
Contributor Author

Claudenw commented Oct 1, 2023

@aherbert thank you for your review. Changes have been made to the base code. One of these days I will figure out how you see all those things that I seem to miss.

@garydgregory
Copy link
Member

Let's make sure we let GitHub squash the commits once this is approved.

@Claudenw
Copy link
Contributor Author

Claudenw commented Oct 8, 2023

@aherbert Have you had a chance to look at the test code? I have a series of blog posts about Bloom filters coming out "soon" (depending on when my company gets around to finalizing it). The second one in the series talks about commons-collections Bloom filters. I am hoping we have this in the 4.5-SNAPSHOT before then.

Thanks to you and @garydgregory for all your assistance in making the entire package a better submission.

Copy link
Contributor

@aherbert aherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Claudenw There are some minor things to clean up here but no major issues. Can you check the LayerManagerTest for testAdvanceOnCalculatedFull and testAdvanceOnSaturation are using the correct maxN.

assertTrue(cp.forEachRemaining());

// test last item not checked
cp = new CountingPredicate<>(ary, (x, y) -> y == Integer.valueOf(2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not testing what you want. You never call cp.test. So when you call cp.forEachRemaining your predicate will be called with (1, null) and then (2, null). You should check for this. But you are checking y == 2 which is not targeting any functionality of the CountingPredicate. A similar argument for the test below where you check y == 1. Since y will be null in forEachRemaining this is not testing a completion of the forEachRemaining loop.

You should add a test case that when you call cp.test(Z) that the constructor BiPredicate receives (1, Z) then (2, Z).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


SparseDefaultBloomFilter(final Shape shape) {
public SparseDefaultBloomFilter(final Shape shape) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do not need to be public since they are used only in the package. If I revert this change the test suite still functions.

Are you using them externally by importing collections test-jar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is exactly the usage. In testing a new BloomFilter implementation is is handy to have a clean implementation of a sparse filter as used in the testing to verify correct functioning of the new Bloom filter.

/**
* A default implementation of a non-sparse Bloom filter.
*/
public static class NonSparseDefaultBloomFilter extends AbstractDefaultBloomFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to be public

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -41,4 +41,16 @@ public void testMergeShortBitMapProducer() {
assertTrue(filter.merge(producer));
assertEquals(1, filter.cardinality());
}

@Test
public void testCardinalityAndIsEmpty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just doing what the parent class does. The extra test should be in its own fixture, e.g.

@Test
public void testEmptyAfterMergeWithNothing() {
    // test the case where is empty after merge
    // in this case the internal cardinality == -1
    BloomFilter bf = createEmptyFilter(getTestShape());
    bf.merge(IndexProducer.fromIndexArray());
    assertTrue(bf.isEmpty());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as fixture in abstract test class.

@garydgregory
Copy link
Member

Hi All. Is this PR ready to be merged?

@aherbert
Copy link
Contributor

There are some changes to make to the test code. I only reviewed this yesterday.

@garydgregory
Copy link
Member

There are some changes to make to the test code. I only reviewed this yesterday.

👍 TY for the update

@Claudenw
Copy link
Contributor Author

@aherbert just a friendly ping. I'd like to get this closed.

Copy link
Contributor

@aherbert aherbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the changes since the last review. It looks OK. I noted a change to make the test able to support concurrent test execution.

import org.junit.jupiter.api.Test;

public class CountingPredicateTest {
private List<Pair<Integer, Integer>> expected = new ArrayList<>();
private List<Pair<Integer, Integer>> result = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an argument to makeFunc. The expected can be created for each test that uses it.

Currently using a class level object works as JUnit is only executing one test at a time on an instance. But it is cleaner and more thread safe to pass in result to the makeFunc helper (which can be static).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aherbert, thanks for the suggestion. I made the changes.

@@ -28,11 +28,11 @@
import org.junit.jupiter.api.Test;

public class CountingPredicateTest {
private List<Pair<Integer, Integer>> expected = new ArrayList<>();
private List<Pair<Integer, Integer>> result = new ArrayList<>();
//private List<Pair<Integer, Integer>> expected = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Claudenw I don't think we need to keep these.

@garydgregory
Copy link
Member

@aherbert @Claudenw
Is this PR ready to merge? If not, what remains? TY!

@aherbert
Copy link
Contributor

I previously requested Claude to remove some commented out code in a test class. Otherwise this is fine.

@garydgregory garydgregory changed the title COLLECTIONS-843: Implement Layered Bloom filter [COLLECTIONS-843] Implement Layered Bloom filter Dec 22, 2023
@garydgregory garydgregory merged commit 0438ede into apache:master Dec 22, 2023
7 checks passed
garydgregory added a commit that referenced this pull request Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants