[Collections-857] update bloom filter documentation (#508)

* clarification and links * updated documentation
apache · Jul 3, 2024 · 8125391 · 8125391
1 parent 2894cd4
commit 8125391
Show file tree

Hide file tree

Showing 5 changed files with 230 additions and 55 deletions.
diff --git a/src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java b/src/main/java/org/apache/commons/collections4/bloomfilter/package-info.java
@@ -32,7 +32,7 @@
  * list. There are lots of other uses, and in most cases the reason is to perform a fast check as a gateway for a longer
  * operation.</p>
  *
- * <p>Some Bloom filters (e.g. {@link CountingBloomFilter}) use counters rather than bits. In this case each counter
+ * <p>Some Bloom filters (e.g. {@link org.apache.commons.collections4.bloomfilter.CountingBloomFilter}) use counters rather than bits. In this case each counter
  * is called a <em>cell</em>.</p>
  *
  * <h3>BloomFilter</h3>
@@ -46,19 +46,25 @@
  * <ul>
  *     <li><em>bit map</em> - In the {@code bloomfilter} package a <em>bit map</em> is not a structure but a logical construct.  It is conceptualized
  *     as an ordered collection of {@code long} values each of which is interpreted as the enabled true/false state of 64 continuous indices.  The mapping of
- *     bits into the {@code long} values is described in the {@link BitMaps} Javadoc.</li>
+ *     bits into the {@code long} values is described in the {@link org.apache.commons.collections4.bloomfilter.BitMaps} Javadoc.</li>
  *
  *     <li><em>index</em> - In the {@code bloomfilter} package an Index is a logical collection of {@code int}s specifying the enabled
  *     bits in the bit map.</li>
  *
- *     <li><em>cell</em> - Some Bloom filters (e.g. {@link CountingBloomFilter}) use counters rather than bits.  In the {@code bloomfilter} package
- *     Cells are pairs of ints representing an index and a value.  They are not {@code Pair} objects.  </li>
+ *     <li><em>cell</em> - Some Bloom filters (e.g. {@link org.apache.commons.collections4.bloomfilter.CountingBloomFilter}) use counters rather than bits.  In the {@code bloomfilter} package
+ *     Cells are pairs of ints representing an index and a value.  They are not the standard Java {@code Pair} objects,
+ *     nor the Apache Commons Lang version either.</li>
  *
- *     <li><em>extractor</em> - The extractors are {@link FunctionalInterface}s that are conceptually iterators on a bit map, an <em>index</em>, or a
+ *     <li><em>extractor</em> - The extractors are {@link java.lang.FunctionalInterface}s that are conceptually iterators on a bit map, an <em>index</em>, or a
  *     collection of <em>cells</em>, with an early termination switch.  Extractors have
- *     names like {@link org.apache.commons.collections4.bloomfilter.BitMapExtractor} or {@link org.apache.commons.collections4.bloomfilter.IndexExtractor} and have a {@code processXs} methods that take a
- *     {@code Predicate<X>} argument (e.g. {@link org.apache.commons.collections4.bloomfilter.BitMapExtractor#processBitMaps(java.util.function.LongPredicate)} or {@code processIndicies(IntPredicate)}).
- *     That predicate is expected to process each of the Xs in turn and return {@code true} if the processing should continue
+ *     names like {@link org.apache.commons.collections4.bloomfilter.BitMapExtractor} or
+ *     {@link org.apache.commons.collections4.bloomfilter.IndexExtractor} and have a {@code processXs} methods that take a
+ *     type specialization of {@link java.util.function.Predicate}.
+ *     {@code Predicate} type argument.
+ *     (e.g. {@link org.apache.commons.collections4.bloomfilter.BitMapExtractor#processBitMaps(java.util.function.LongPredicate)},
+ *     {@link org.apache.commons.collections4.bloomfilter.IndexExtractor#processIndices(java.util.function.IntPredicate)},
+ *     and {@link org.apache.commons.collections4.bloomfilter.CellExtractor#processCells(org.apache.commons.collections4.bloomfilter.CellExtractor.CellPredicate)}).
+ *     The predicate is expected to process each of the Xs in turn and return {@code true} if the processing should continue
  *     or {@code false} to stop it. </li>
  * </ul>
  *
@@ -69,66 +75,72 @@
  * <h4>Implementation Notes</h4>
  *
  * <p>The architecture is designed so that the implementation of the storage of bits is abstracted. Rather than specifying a
- * specific state representation we require that all Bloom filters implement the {@link org.apache.commons.collections4.bloomfilter.BitMapExtractor} and {@link org.apache.commons.collections4.bloomfilter.IndexExtractor} interfaces,
+ * specific state representation we require that all Bloom filters implement the {@link org.apache.commons.collections4.bloomfilter.BitMapExtractor}
+ * and {@link org.apache.commons.collections4.bloomfilter.IndexExtractor} interfaces,
  * Counting-based Bloom filters implement {@link org.apache.commons.collections4.bloomfilter.CellExtractor} as well.  There are static
  * methods in the various Extractor interfaces to convert from one type to another.</p>
  *
- * <p>Programs that utilize the Bloom filters may use the {@link org.apache.commons.collections4.bloomfilter.BitMapExtractor} or {@link org.apache.commons.collections4.bloomfilter.IndexExtractor} to retrieve
+ * <p>Programs that utilize the Bloom filters may use the {@link org.apache.commons.collections4.bloomfilter.BitMapExtractor}
+ * or {@link org.apache.commons.collections4.bloomfilter.IndexExtractor} to retrieve
  * or process a representation of the internal structure.
- * Additional methods are available in the {@link org.apache.commons.collections4.bloomfilter.BitMaps} class to assist in manipulation of bit map representations.</p>
+ * Additional methods are available in the {@link org.apache.commons.collections4.bloomfilter.BitMaps} class to assist in
+ * manipulation of bit map representations.</p>
  *
  * <p>The Bloom filter is an interface that requires implementation of 9 methods:</p>
  * <ul>
- * <li>{@link BloomFilter#cardinality()} returns the number of bits enabled in the Bloom filter.</li>
+ * <li>{@link org.apache.commons.collections4.bloomfilter.BloomFilter#cardinality()} returns the number of bits enabled in the Bloom filter.</li>
  *
- * <li>{@link BloomFilter#characteristics()} which returns an integer of characteristics flags.</li>
+ * <li>{@link org.apache.commons.collections4.bloomfilter.BloomFilter#characteristics()} which returns an integer of characteristics flags.</li>
  *
- * <li>{@link BloomFilter#clear()} which resets the Bloomfilter to its initial empty state.</li>
+ * <li>{@link org.apache.commons.collections4.bloomfilter.BloomFilter#clear()} which resets the Bloomfilter to its initial empty state.</li>
  *
- * <li>{@link BloomFilter#contains(IndexExtractor)} which returns true if the bits specified by the indices generated by
- * IndexExtractor are enabled in the Bloom filter.</li>
+ * <li>{@link org.apache.commons.collections4.bloomfilter.BloomFilter#contains(IndexExtractor)} which returns true if the bits specified
+ * by the indices generated by IndexExtractor are enabled in the Bloom filter.</li>
  *
- * <li>{@link BloomFilter#copy()} which returns a fresh copy of the bitmap.</li>
+ * <li>{@link org.apache.commons.collections4.bloomfilter.BloomFilter#copy()} which returns a fresh copy of the bitmap.</li>
  *
- * <li>{@link BloomFilter#getShape()} which returns the shape the Bloom filter was created with.</li>
+ * <li>{@link org.apache.commons.collections4.bloomfilter.BloomFilter#getShape()} which returns the shape the Bloom filter was created with.</li>
  *
- * <li>{@link BloomFilter#merge(BitMapExtractor)} which merges the BitMaps from the BitMapExtractor into the internal
- * representation of the Bloom filter.</li>
+ * <li>{@link org.apache.commons.collections4.bloomfilter.BloomFilter#merge(BitMapExtractor)} which merges the BitMaps from the BitMapExtractor
+ * into the internal representation of the Bloom filter.</li>
  *
- * <li>{@link BloomFilter#merge(IndexExtractor)} which merges the indices from the IndexExtractor into the internal
- * representation of the Bloom filter.</li>
+ * <li>{@link org.apache.commons.collections4.bloomfilter.BloomFilter#merge(IndexExtractor)} which merges the indices from the IndexExtractor
+ * into the internal representation of the Bloom filter.</li>
  * </ul>
  *
  * <p>Other methods should be implemented where they can be done so more efficiently than the default implementations.</p>
  *
  * <h3>CountingBloomFilter</h3>
  *
- * <p>The {@link org.apache.commons.collections4.bloomfilter.CountingBloomFilter} extends the Bloom filter by counting the number of times a specific bit has been
+ * <p>The {@link org.apache.commons.collections4.bloomfilter.CountingBloomFilter} extends the Bloom filter by counting the number
+ * of times a specific bit has been
  * enabled or disabled. This allows the removal (opposite of merge) of Bloom filters at the expense of additional
  * overhead.</p>
  *
  * <h3>LayeredBloomFilter</h3>
  *
- * <p>The {@link org.apache.commons.collections4.bloomfilter.LayeredBloomFilter} extends the Bloom filter by creating layers of Bloom filters that can be queried as a single
+ * <p>The {@link org.apache.commons.collections4.bloomfilter.LayeredBloomFilter} extends the Bloom filter by creating layers of Bloom
+ * filters that can be queried as a single
  * Filter or as a set of filters. This adds the ability to perform windowing on streams of data.</p>
  *
  * <h3>Shape</h3>
  *
- * <p>The {@link org.apache.commons.collections4.bloomfilter.Shape} describes the Bloom filter using the number of bits and the number of hash functions.  It can be specified
+ * <p>The {@link org.apache.commons.collections4.bloomfilter.Shape} describes the Bloom filter using the number of bits and the number
+ * of hash functions.  It can be specified
  * by the number of expected items and desired false positive rate.</p>
  *
  * <h3>Hasher</h3>
  *
- * <p>A {@link org.apache.commons.collections4.bloomfilter.Hasher} converts bytes into a series of integers based on a Shape. Each hasher represents one item being added
+ * <p>A {@link org.apache.commons.collections4.bloomfilter.Hasher} converts bytes into a series of integers based on a Shape.
+ * Each hasher represents one item being added
  * to the Bloom filter.</p>
  *
- * <p>The {@link org.apache.commons.collections4.bloomfilter.EnhancedDoubleHasher} uses a combinatorial generation technique to create the integers. It is easily
+ * <p>The {@link org.apache.commons.collections4.bloomfilter.EnhancedDoubleHasher} uses a combinatorial generation technique to
+ * create the integers. It is easily
  * initialized by using a byte array returned by the standard {@link java.security.MessageDigest} or other hash function to
  * initialize the Hasher. Alternatively, a pair of a long values may also be used.</p>
  *
- * <p>Other implementations of the {@link org.apache.commons.collections4.bloomfilter.Hasher} are easy to implement, and should make use of the {@code Hasher.Filter}
- * and/or {@code Hasher.FileredIntConsumer} classes to filter out duplicate indices when implementing
- * {@code Hasher.uniqueIndices(Shape)}.</p>
+ * <p>Other implementations of the {@link org.apache.commons.collections4.bloomfilter.Hasher} are easy to implement.</p>
  *
  * <h2>References</h2>
  *